Opposite of "only-matching" in grep? - grep

Is there any way to do the opposite of showing only the matching part of strings in grep (the -o flag), that is, show everything except the part that matches the regex?
That is, the -v flag is not the answer, since that would not show files containing the match at all, but I want to show these lines, but not the part of the line that matches.
EDIT: I wanted to use grep over sed, since it can do "only-matching" matches on multi-line, with:
cat file.xml|grep -Pzo "<starttag>.*?(\n.*?)+.*?</starttag>"

This is a rather unusual requirement, I don't think grep would alternate the strings like that. You can achieve this with sed, though:
sed -n 's/$PATTERN//gp' file
EDIT in response to OP's edit:
You can do multiline matching with sed, too, if the file is small enough to load it all into memory:
sed -rn ':r;$!{N;br};s/<starttag>.*?(\n.*?)+.*?<\/starttag>//gp' file.xml

You can do that with a little help from sed:
grep "pattern" input_file | sed 's/pattern//g'

I don't think there is a way in grep.
If you use ack, you could output Perl's special variables $` and $' variables to show everything before and after the match, respectively:
ack string --output="\$`\$'"
Similarly if you wanted to output what did match along with other text, you could use $& which contains the matched string;
ack string --output="Matched: $&"

Related

Find a string between two characters with grep

I have found on this answer the regex to find a string between two characters. In my case I want to find every pattern between ‘ and ’. Here's the regex :
(?<=‘)(.*?)(?=’)
Indeed, it works when I try it on https://regex101.com/.
The thing is I want to use it with grep but it doesn't work :
grep -E '(?<=‘)(.*?)(?=’)' file
Is there anything missing ?
Those are positive look-ahead and look behind assertions. You need to enable it using PCRE(Perl Compatible Regex) and perhaps its better to get only matching part using -o option in GNU grep:
grep -oP '(?<=‘)(.*?)(?=’)' file

Grep Filenames from ls for specific part of them

I want to extract a specific part out of the filenames to work with them.
Example:
ls -1
REZ-Name1,Surname1-02-04-2012.png
REZ-Name2,Surname2-07-08-2013.png
....
So I want to get only the part with the name.
How can this be achieved ?
There are several ways to do this. Here's a loop:
for file in REZ-*-??-??-????.png
do
name=${file#*-}
name=${name%-??-??-????.png}
echo "($name)"
done
Given a variety of filenames with all sorts of edge cases from spacing, additional hyphens and line feeds:
REZ-Anna-Maria,de-la-Cruz-12-32-2015.png
REZ-Bjørn,Dæhlie-01-01-2015.png
REZ-First,Last-12-32-2015.png
REZ-John Quincy,Adams-11-12-2014.png
REZ-Ridiculous example # this is one filename
is ridiculous,but fun-22-11-2000.png # spanning two lines
it outputs:
(Anna-Maria,de-la-Cruz)
(Bjørn,Dæhlie)
(First,Last)
(John Quincy,Adams)
(Ridiculous example
is ridiculous,but fun)
If you're less concerned with correctness, you can simplify it further:
$ ls | grep -o '[^-]*,[^-]*'
Maria,de
Bjørn,Dæhlie
First,Last
John Quincy,Adams
is ridiculous,but fun
In this case, cut makes more sense than grep:
ls -l | cut -f2 -d-
cut the second field from the input, using '-' as the field delimiter. That other guy's answer will correctly handle some cases mine will not, but for one off uses, I generally find the semantics of cut to be much easier to remember.

grep from beginning of found word to end of word

I am trying to grep the output of a command that outputs unknown text and a directory per line. Below is an example of what I mean:
.MHuj.5.. /var/log/messages
The text and directory may be different from time to time or system to system. All I want to do though is be able to grep the directory out and send it to a variable.
I have looked around but cannot figure out how to grep to the end of a word. I know I can start the search phrase looking for a "/", but I don't know how to tell grep to stop at the end of the word, or if it will consider the next "/" a new word or not. The directories listed could change, so I can't assume the same amount of directories will be listed each time. In some cases, there will be multiple lines listed and each will have a directory list in it's output. Thanks for any help you can provide!
If your directory paths does not have spaces then you can do:
$ echo '.MHuj.5.. /var/log/messages' | awk '{print $NF}'
/var/log/messages
It's not clear from a single example whether we can generalize that e.g. the first occurrence of a slash marks the beginning of the data you want to extract. If that holds, try
grep -o '/.*' file
To fetch everything after the last space, try
grep -o '[^ ]*$' file
For more advanced pattern matching and extraction, maybe look at sed, or Awk or Perl or Python.
Your line can be described as:
^\S+\s+(\S+)$
That's assuming whitespace is your delimiter between the random text and the directory. It simply separates the whitespace from the non-whitespace and captures the second part.
Or you might want to look into the word boundary character class: \b.
I know you said to use grep, but I can't help to mention that this is trivially done using awk:
awk '{ print $NF }' input.txt
This is assuming that a whitespace is the delimiter and that the path does not contain any whitespaces.

How to filter using grep on a selected word

grep (GNU grep) 2.14
Hello,
I have a log file that I want to filter on a selected word. However, it tends to filter on many for example.
tail -f gateway-* | grep "P_SIP:N_iptB1T1"
This will also find words like this:
"P_SIP:N_iptB1T10"
"P_SIP:N_iptB1T11"
"P_SIP:N_iptB1T12"
etc
However, I don't want to display anything after the 1. grep is picking up 11, 12, 13, etc.
Many thanks for any suggestions,
You can restrict the word to end at 1:
tail -f gateway-* | grep "P_SIP:N_iptB1T1\>"
This will work assuming that you have a matching case which is only "P_SIP:N_iptB1T1".
But if you want to extract from P_SIP:N_iptB1T1x, and display only once, then you need to restrict to show only first match.
grep -o "P_SIP:N_iptB1T1"
-o, --only-matching show only the part of a line matching PATTERN
More info
At least two approaches can be tried:
grep -w pattern matches for full words. Seems to work for this case too, even though the pattern has punctuation.
grep pattern -m 1 to restrict the output to first match. (Also doable with grep xxx | head -1)
If the lines contains the quotes as in your example, just use the -E option in grep and match the closing quote with \". For example:
grep -E "P_SIP:N_iptB1T1\"" file
If these quotes aren't in the text file, and there's blank spaces or endlines after the word, you can match these too:
# The word is followed by one or more blanks
grep -E "P_SIP:N_iptB1T1\s+" file
# Match lines ending with the interesting word
grep -E "P_SIP:N_iptB1T1$" file

How to replace a path with another path in sed? [duplicate]

This question already has answers here:
Using different delimiters in sed commands and range addresses
(3 answers)
Closed 9 months ago.
I have a csh script (although I can change languages if it has any relevance) where I have to:
sed s/AAA/BBB/ file
The problem is that AAA and BBB are paths, and so contain '/'. AAA is fixed, so I can say:
sed s/\\\/A\\\/A\\\A/BBB/ file
However, BBB is based on variables, including $PWD. How do I escape the '/' in $PWD?
OR is there some other way I should be doing this entirely?
sed can use any separator instead of / in the s command. Just use something that is not encountered in your paths:
s+AAA+BBB+
and so on.
Alternatively (and if you don't want to guess), you can pre-process your path with sed to escape the slashes:
pwdesc=$(echo $PWD | sed 's_/_\\/_g')
and then do what you need with $pwdesc.
In circumstances where the replacement string or pattern string contain slashes, you can make use of the fact that GNU sed allows an alternative delimiter for the substitute command. Common choices for the delimiter are the pipe character | or the hash # - the best choice of delimiting character will often depend on the type of file being processed. In your case you can try
sed -i 's#/path/to/AAA#/path/to/BBB#g' your_file
Note: The g after last # is to change all occurrences in file if you want to change first ouccurence do not use g
sed -i "s|$fileWithPath|HAHA|g" file
EDIT 1
sed -i 's|path/to/foo|path/to/bar|g' file
Using csh for serious scripting is usually not recommended. However, that is tangential to the issue at hand.
You're probably after something like:
sed -e "s=$oldpath=$newpath="
where the shell variable $oldpath contains the value to be replaced and $newpath contains the replacement, and it is assumed that neither variable contains an equals sign. That is, you're allowed to choose the delimiter on pattern, and avoiding the usual / delimiter avoids problems with slashes in pathnames. If you think = might appear in your file names, choose something less likely to appear, such as control-A or control-G.
In my case the below method works.
sed -i 's/playstation/PS4/' input.txt
Can be also be written as
sed -i 's+playstation+PS4+' input.txt
sed : is stream editor
-i : Allows to edit the source file
+: Is delimiter.
I hope the above information works for you 😃.
You can use parenthesis expansion ${i/p/r} to escape the slashes.
In this case ${i//p/r} for escaping all occurrences.
$p1=${p1//\//\\/}
$p2=${p2//\//\\/}
sed s/$p1/$p2/ file
Or, more concise, in one line sed s/${p1//\//\\/}/${p2//\//\\/}/ file
The two fist slashes // are a separator in parenthesis expansion saying we are matching all occurrences, then \/ is for escaping the slash in the search template, the / as a second separator in the expansion, and then \\/ is the replacement, in witch the backslash must be escaped.
We just needed to get the /h/ network path references out of the path. if we pointed them back to the /c/ drive they would map to non-existant directories but resolve quickly. In my .bashrc I used
PATH=`echo $PATH | sed -e "s+/h/+/c/+g"`

Resources