Grep for a token outside of the last line - grep

This seems like it should be easy, but I can't get the right syntax. I'm trying to use grep to find all files with a certain token that have at least one line following the line with the token in it. So something like:
blah blah token blah
another line here
would be found by the grep, but not:
blah blah token blah

I'm pretty sure grep doesn't do multi-line stuff like this (though there is a -A option to print trailing context). One way would be to use head to cut off the last line first, then pipe it to grep:
for f in *; do
echo "$f:"
head -n -1 "$f" | grep token
Above code is not tested, I'm stuck on Windows atm.

Related

Use shell variable in grep lookahead in csh

I am trying to utilize a grep lookahead to get a value at the end of a line for a project I'm working on. The main issue I'm having is that I'm not sure how to use a shell variable in the grep lookahead syntax in cshell
Here's the gist of what I'm trying to do.
There will be a dogfile.txt with several lines listing the names of dogs in the format below
genericDog2033, pomeranian
genericDog2034, greatDane
genericDog2035, Doberman
I wanted a way of retrieving the breed of the dog after the comma on each line so I thought a grep lookahead might be a good way of doing it. The project I'm working on isn't so hard-coded however, so I have no way of knowing what genericDog number I am searching for. There will be a shell variable in a greater while loop which will have access to the dog name.
For example if I set the dogNumber variable to the first dog in the file like so:
set dogNumber = genericDog2033
I then try to access the value of dogNumber in the grep lookahead
set dogBreed = `cat File.txt | grep -oP '(?<=$dogNumber ,)[^ ]*'`
The problem with the line above is that I think grep is looking for the literal string "$dognumber ," in the file which obviously doesn't exist. Is there some sort of wrapper I can put around the shell variable so cshell knows that dogNumber is a variable? I'm also open to other methods of doing this. Any help would be appreciated, this is the literal last line of code I need to finish my project and I'm at my wits end.
Variable expansion only happens inside double quotes ("), and not single quotes ('):
% set var = 'hello'
% echo '$var'
$var
% echo "$var"
hello
Furthermore, you have an error in your regexp:
(?<=$dogNumber ,)[^ ]*
In your data, the space is after the comma, not before.
% set dogNumber = genericDog2033
% set dogBreed = `cat a | grep -oP "(?<=$dogNumber, )[^ ]*"`
% echo $dogBreed
pomeranian
The easiest way to debug this is to not use variables at all in the first place, and simply check if the grep works:
% grep -oP "(?<=genericDog2034 ,)[^ ].*" a
[no output]
Then first make the grep work with static data, add the variable to make that work, and then put it all together by assigning it to a variable.

Grep last match until the end

I've looked around StackExchange sites but I haven't found anything that's quite what I'm looking for. Here are two use cases of grep:
Printing items before/after a match
Print a certain match
I'm trying to parse a log file, and I want to return the last error in the log which is, predictably, at the end of the file. However, sometimes the errors are multiple lines. The answers for 'how to grep the last match' all involve either tail or head, and only work with a single line.
In my case, I want to simply return everything in the file, starting with the last match. Typically, this won't be any more than 10-15 lines maximum, so a grep -A 15 does the trick there. But, I still need to only get the last one of these, so that alone doesn't produce the right output.
The naive approach is to use a two-part match, to first get what the last match is and then everything after that. This won't work for me, because I can't guarantee that the last match is unique.
Is it possible to do this with grep somehow, or would there be better tools for this?
There is a way to get sed to do this but I can't remember.
If you are open to using a combination of commands here is something that might work:
# Get the line number of teh last match
LNO=$( grep -n 'the error' the_file | tail -1 | cut -d":" -f1 )
# Now use sed to print all lines from that point:
sed -n "$LNO,\$p" the_file
I think there's an exact duplicate somewhere, but I found only these close ones:
How to get lines from the last match to the end of file?
grep last match and it's following lines
Here's one way to do it:
$ cat ip.txt
foo123
error 1
xyz
error 2
99999
88888
$ tac ip.txt | sed '/error/q' | tac
error 2
99999
88888

grep from beginning of found word to end of word

I am trying to grep the output of a command that outputs unknown text and a directory per line. Below is an example of what I mean:
.MHuj.5.. /var/log/messages
The text and directory may be different from time to time or system to system. All I want to do though is be able to grep the directory out and send it to a variable.
I have looked around but cannot figure out how to grep to the end of a word. I know I can start the search phrase looking for a "/", but I don't know how to tell grep to stop at the end of the word, or if it will consider the next "/" a new word or not. The directories listed could change, so I can't assume the same amount of directories will be listed each time. In some cases, there will be multiple lines listed and each will have a directory list in it's output. Thanks for any help you can provide!
If your directory paths does not have spaces then you can do:
$ echo '.MHuj.5.. /var/log/messages' | awk '{print $NF}'
/var/log/messages
It's not clear from a single example whether we can generalize that e.g. the first occurrence of a slash marks the beginning of the data you want to extract. If that holds, try
grep -o '/.*' file
To fetch everything after the last space, try
grep -o '[^ ]*$' file
For more advanced pattern matching and extraction, maybe look at sed, or Awk or Perl or Python.
Your line can be described as:
^\S+\s+(\S+)$
That's assuming whitespace is your delimiter between the random text and the directory. It simply separates the whitespace from the non-whitespace and captures the second part.
Or you might want to look into the word boundary character class: \b.
I know you said to use grep, but I can't help to mention that this is trivially done using awk:
awk '{ print $NF }' input.txt
This is assuming that a whitespace is the delimiter and that the path does not contain any whitespaces.

sed add additional column

I want to add an additional column of ones to a tab separated file.
The file looks like this:
#> cat /tmp/myfile
Aal Fisch_und_Fleisch
Aalsuppe Fisch_und_Fleisch
The way I wanted to do it is by sed, matching the whole line, printing it out together with the new column. However the additional column is written in the middle of the lines instead of the end:
#> cat /tmp/myfile | sed 's#^\(.*\)$#\1\t1#g'
Aal 1isch_und_Fleisch
Aalsuppe1 Fisch_und_Fleisch
When I do a sanity check with some manually created lines it works, though:
#> echo -e "aaaaaaaaaa\taaaaaaaaaaaa\nbbbbbbb\tbbbbbbbb" | sed 's#^\(.*\)$#\1\t1#g'
aaaaaaaaaa aaaaaaaaaaaa 1
bbbbbbb bbbbbbbb 1
I guessed it might be an encoding/line break issue, here is what file is saying:
#> file /tmp/myfile
/tmp/myfile: ASCII text, with CRLF line terminators
If it is an encoding/line break issue, how do I go about it?
I'm not able to reproduce your exact issue, but have seen similar things before. Essentially, CRLF line endings can cause strangeness in the visual display, because the CR part, the carriage return, can cause the cursor to move to the begin of the same line, rather than to the beginning of a new line. Easiest is probably just to switch to Unix-style endings.
To switch to Unix-style endings, use one of
dos2unix
tr -d '\r'
As a whole, something like
cat /tmp/myfile | dos2unix | sed 's#^\(.*\)$#\1\t1#g'
If you need to switch back, you could use unix2dos.
This might work for you (GNU sed):
sed 's/$/\t1/' file

Opposite of "only-matching" in grep?

Is there any way to do the opposite of showing only the matching part of strings in grep (the -o flag), that is, show everything except the part that matches the regex?
That is, the -v flag is not the answer, since that would not show files containing the match at all, but I want to show these lines, but not the part of the line that matches.
EDIT: I wanted to use grep over sed, since it can do "only-matching" matches on multi-line, with:
cat file.xml|grep -Pzo "<starttag>.*?(\n.*?)+.*?</starttag>"
This is a rather unusual requirement, I don't think grep would alternate the strings like that. You can achieve this with sed, though:
sed -n 's/$PATTERN//gp' file
EDIT in response to OP's edit:
You can do multiline matching with sed, too, if the file is small enough to load it all into memory:
sed -rn ':r;$!{N;br};s/<starttag>.*?(\n.*?)+.*?<\/starttag>//gp' file.xml
You can do that with a little help from sed:
grep "pattern" input_file | sed 's/pattern//g'
I don't think there is a way in grep.
If you use ack, you could output Perl's special variables $` and $' variables to show everything before and after the match, respectively:
ack string --output="\$`\$'"
Similarly if you wanted to output what did match along with other text, you could use $& which contains the matched string;
ack string --output="Matched: $&"

Resources