grep last match and it's following lines - grep

I've learnt how to grep lines before and after the match and to grep the last match but I haven't discovered how to grep the last match and the lines underneath it.
The scenario is a server log.
I want to list a dynamic output from a command. The command is likely to be used several times in one server log. So I imagine the match would be the command and somehow grep can use -A with some other flag or variation of a tail command, to complete the outcome I'm seeking.

The approach I would take it to reverse the problem as it's easier to find the first match and print the context lines. Take the file:
$ cat file
foo
1
2
foo
3
4
foo
5
6
Say we want the last match of foo and the following to lines we could just reverse the file with tac, find the first match and n lines above using -Bn and stop using -m1. Then simple re-reverse the output with tac:
$ tac file | grep foo -B2 -m1 | tac
foo
5
6
Tools like tac and rev can make problems that seem difficult much easier.

using awk instead:
awk '/pattern/{m=$0;l=NR}l+1==NR{n=$0}END{print m;print n}' foo.log
small test, find the last line matching /8/ and the next line of it:
kent$ seq 20|awk '/8/{m=$0;l=NR}l+1==NR{n=$0}END{print m;print n}'
18
19

Related

Is it possible to show all lines after match with grep/ripgrep? [duplicate]

Question: I'd like to print a single line directly following a line that contains a matching pattern.
My version of sed will not take the following syntax (it bombs out on +1p) which would seem like a simple solution:
sed -n '/ABC/,+1p' infile
I assume awk would be better to do multiline processing, but I am not sure how to do it.
Never use the word "pattern" in this context as it is ambiguous. Always use "string" or "regexp" (or in shell "globbing pattern"), whichever it is you really mean. See How do I find the text that matches a pattern? for more about that.
The specific answer you want is:
awk 'f{print;f=0} /regexp/{f=1}' file
or specializing the more general solution of the Nth record after a regexp (idiom "c" below):
awk 'c&&!--c; /regexp/{c=1}' file
The following idioms describe how to select a range of records given a specific regexp to match:
a) Print all records from some regexp:
awk '/regexp/{f=1}f' file
b) Print all records after some regexp:
awk 'f;/regexp/{f=1}' file
c) Print the Nth record after some regexp:
awk 'c&&!--c;/regexp/{c=N}' file
d) Print every record except the Nth record after some regexp:
awk 'c&&!--c{next}/regexp/{c=N}1' file
e) Print the N records after some regexp:
awk 'c&&c--;/regexp/{c=N}' file
f) Print every record except the N records after some regexp:
awk 'c&&c--{next}/regexp/{c=N}1' file
g) Print the N records from some regexp:
awk '/regexp/{c=N}c&&c--' file
I changed the variable name from "f" for "found" to "c" for "count" where
appropriate as that's more expressive of what the variable actually IS.
f is short for found. Its a boolean flag that I'm setting to 1 (true) when I find a string matching the regular expression regexp in the input (/regexp/{f=1}). The other place you see f on its own in each script it's being tested as a condition and when true causes awk to execute its default action of printing the current record. So input records only get output after we see regexp and set f to 1/true.
c && c-- { foo } means "if c is non-zero then decrement it and if it's still non-zero then execute foo" so if c starts at 3 then it'll be decremented to 2 and then foo executed, and on the next input line c is now 2 so it'll be decremented to 1 and then foo executed again, and on the next input line c is now 1 so it'll be decremented to 0 but this time foo will not be executed because 0 is a false condition. We do c && c-- instead of just testing for c-- > 0 so we can't run into a case with a huge input file where c hits zero and continues getting decremented so often it wraps around and becomes positive again.
It's the line after that match that you're interesting in, right? In sed, that could be accomplished like so:
sed -n '/ABC/{n;p}' infile
Alternatively, grep's A option might be what you're looking for.
-A NUM, Print NUM lines of trailing context after matching lines.
For example, given the following input file:
foo
bar
baz
bash
bongo
You could use the following:
$ grep -A 1 "bar" file
bar
baz
$ sed -n '/bar/{n;p}' file
baz
I needed to print ALL lines after the pattern ( ok Ed, REGEX ), so I settled on this one:
sed -n '/pattern/,$p' # prints all lines after ( and including ) the pattern
But since I wanted to print all the lines AFTER ( and exclude the pattern )
sed -n '/pattern/,$p' | tail -n+2 # all lines after first occurrence of pattern
I suppose in your case you can add a head -1 at the end
sed -n '/pattern/,$p' | tail -n+2 | head -1 # prints line after pattern
And I really should include tlwhitec's comment in this answer (since their sed-strict approach is the more elegant than my suggestions):
sed '0,/pattern/d'
The above script deletes every line starting with the first and stopping with (and including) the line that matches the pattern. All lines after that are printed.
awk Version:
awk '/regexp/ { getline; print $0; }' filetosearch
If pattern match, copy next line into the pattern buffer, delete a return, then quit -- side effect is to print.
sed '/pattern/ { N; s/.*\n//; q }; d'
Actually sed -n '/pattern/{n;p}' filename will fail if the pattern match continuous lines:
$ seq 15 |sed -n '/1/{n;p}'
2
11
13
15
The expected answers should be:
2
11
12
13
14
15
My solution is:
$ sed -n -r 'x;/_/{x;p;x};x;/pattern/!s/.*//;/pattern/s/.*/_/;h' filename
For example:
$ seq 15 |sed -n -r 'x;/_/{x;p;x};x;/1/!s/.*//;/1/s/.*/_/;h'
2
11
12
13
14
15
Explains:
x;: at the beginning of each line from input, use x command to exchange the contents in pattern space & hold space.
/_/{x;p;x};: if pattern space, which is the hold space actually, contains _ (this is just a indicator indicating if last line matched the pattern or not), then use x to exchange the actual content of current line to pattern space, use p to print current line, and x to recover this operation.
x: recover the contents in pattern space and hold space.
/pattern/!s/.*//: if current line does NOT match pattern, which means we should NOT print the NEXT following line, then use s/.*// command to delete all contents in pattern space.
/pattern/s/.*/_/: if current line matches pattern, which means we should print the NEXT following line, then we need to set a indicator to tell sed to print NEXT line, so use s/.*/_/ to substitute all contents in pattern space to a _(the second command will use it to judge if last line matched the pattern or not).
h: overwrite the hold space with the contents in pattern space; then, the content in hold space is ^_$ which means current line matches the pattern, or ^$, which means current line does NOT match the pattern.
the fifth step and sixth step can NOT exchange, because after s/.*/_/, the pattern space can NOT match /pattern/, so the s/.*// MUST be executed!
This might work for you (GNU sed):
sed -n ':a;/regexp/{n;h;p;x;ba}' file
Use seds grep-like option -n and if the current line contains the required regexp replace the current line with the next, copy that line to the hold space (HS), print the line, swap the pattern space (PS) for the HS and repeat.
Piping some greps can do it (it runs in POSIX shell and under BusyBox):
cat my-file | grep -A1 my-regexp | grep -v -- '--' | grep -v my-regexp
-v will show non-matching lines
-- is printed by grep to separate each match, so we skip that too
If you just want the next line after a pattern, this sed command will work
sed -n -e '/pattern/{n;p;}'
-n supresses output (quiet mode);
-e denotes a sed command (not required in this case);
/pattern/ is a regex search for lines containing the literal combination of the characters pattern (Use /^pattern$/ for line consisting of only of “pattern”;
n replaces the pattern space with the next line;
p prints;
For example:
seq 10 | sed -n -e '/5/{n;p;}'
Note that the above command will print a single line after every line containing pattern. If you just want the first one use sed -n -e '/pattern/{n;p;q;}'. This is also more efficient as the whole file is not read.
This strictly sed command will print all lines after your pattern.
sed -n '/pattern/,${/pattern/!p;}
Formatted as a sed script this would be:
/pattern/,${
/pattern/!p
}
Here’s a short example:
seq 10 | sed -n '/5/,${/5/!p;}'
/pattern/,$ will select all the lines from pattern to the end of the file.
{} groups the next set of commands (c-like block command)
/pattern/!p; prints lines that doesn’t match pattern. Note that the ; is required in early versions, and some non-GNU, of sed. This turns the instruction into a exclusive range - sed ranges are normally inclusive for both start and end of the range.
To exclude the end of range you could do something like this:
sed -n '/pattern/,/endpattern/{/pattern/!{/endpattern/d;p;}}
/pattern/,/endpattern/{
/pattern/!{
/endpattern/d
p
}
}
/endpattern/d is deleted from the “pattern space” and the script restarts from the top, skipping the p command for that line.
Another pithy example:
seq 10 | sed -n '/5/,/8/{/5/!{/8/d;p}}'
If you have GNU sed you can add the debug switch:
seq 5 | sed -n --debug '/2/,/4/{/2/!{/4/d;p}}'
Output:
SED PROGRAM:
/2/,/4/ {
/2/! {
/4/ d
p
}
}
INPUT: 'STDIN' line 1
PATTERN: 1
COMMAND: /2/,/4/ {
COMMAND: }
END-OF-CYCLE:
INPUT: 'STDIN' line 2
PATTERN: 2
COMMAND: /2/,/4/ {
COMMAND: /2/! {
COMMAND: }
COMMAND: }
END-OF-CYCLE:
INPUT: 'STDIN' line 3
PATTERN: 3
COMMAND: /2/,/4/ {
COMMAND: /2/! {
COMMAND: /4/ d
COMMAND: p
3
COMMAND: }
COMMAND: }
END-OF-CYCLE:
INPUT: 'STDIN' line 4
PATTERN: 4
COMMAND: /2/,/4/ {
COMMAND: /2/! {
COMMAND: /4/ d
END-OF-CYCLE:
INPUT: 'STDIN' line 5
PATTERN: 5
COMMAND: /2/,/4/ {
COMMAND: }
END-OF-CYCLE:

grep or ripgrep: How to find only files that match multiple patterns (not only on the same line)?

I'm searching for a fast method to find all files in a folder which contain 2 or more patterns
grep -l -e foo -e bar ./*
or
rg -l -e foo -e bar
show all files containing 'foo' AND 'bar' in the same line or 'foo' OR 'bar' in different lines but I want only files that have at a minimum one 'foo' match AND one 'bar' match in different lines. Files which only have 'foo' matches or only 'bar' matches shall be filtered out.
I know I could chain the grep calls but this will be too slow.
rg with multiline does work, however it will print as result everything in-between the criteria and sometimes that's not useful.
For the use case of chaining searches (in e.g. html, json, etc), where the 1st criterium is just to narrow down the files, and the 2nd criterium is actually what I am looking for, this is a possible solution:
rg -0 -l crit1 | xargs -0 -I % rg -H crit2 %
Alternatively I have just discovered ugrep which supports combining multiple criteria using boolean operators both on line and file level. This is quite something. It's a bit slower than rg + xargs, however it prints nicely all lines matching all criteria from the files (instead of just showing the last criteria from above):
ugrep --files -e crit1 --and -e crit2
If you want to search for two or more words that occur on multiple lines you can use ripgrep's option --multiline-dotall, in addition to to provide -U/--multiline. You also need to search for foo before bar and bar before foo using the | operator:
rg -lU --multiline-dotall 'foo.*bar|bar.*foo' .
For any number of words you'll need to | all permutations of those words. For that I use a small python script (which I called rga) which searches in
the current directory (and downwards), for files that contain all arguments given on the commandline:
#! /opt/util/py310/bin/python
import sys
import subprocess
from itertools import permutations
rgarg = '|'.join(('.*'.join(x) for x in permutations(sys.argv[1:])))
cmd = ['rg', '-lU', '--multiline-dotall', rgarg, '.']
# print(' '.join(cmd))
proc = subprocess.run(cmd, capture_output=True)
sys.stdout.write(proc.stdout.decode('utf-8'))
I have searched successfully with six arguments, above that the commandline becomes to long. There are probably ways around that by saving the argument to a file and adding -f file_name, but I never needed/investigated that.
$ cat f1
afoot
2bar
$ cat f2
foo bar
$ cat f3
foot
$ cat f4
bar
$ cat f5
barred
123
foo3
$ rg -Ul '(?s)foo.*?\n.*?bar|bar.*?\n.*?foo'
f5
f1
You can use -U option to match across lines. The s flag will enable . to match newlines as well. Since you want the matches to be across different lines, you need to match a newline character in between the search terms as well.
So this doesn't perfectly answer the question, but, this is the StackOverflow question that pops up every time I google "ripgrep multiple patterns". So I'm leaving my answer here for the future googler (including myself)...
I primarily work in PowerShell, so this is how I perform an and search in ripgrep in PowerShell. This will match same line matches, which is why it's not a perfect answer, but it will identify files that match both patterns, and runs relatively quickly:
rg -l 'SecondSearchPattern' (rg -l 'FirstSearchPattern')
Explanation:
First the parens run: rg -l 'FirstSearchPattern', which searches all files for the pattern FirstSearchPattern. By using -l it returns a list of file paths only.
By placing it in (parentheses), it runs the whole command first, then "splats" the results of the command into the external rg command.
The external rg command is now run like this:
rg -l 'SecondSearchPattern' "file.txt" "directory\file.txt"
And yes, it does put them into quotes, so it handles paths with spaces. This searches all provided files that match the pattern SecondSearchPattern. Thus returning only files that match both patterns.
You can go one step further and add on | Get-Item (| gi) to return filesystem objects, and | % FullName to get the full path.
rg -l 'SecondSearchPattern' (rg -l 'FirstSearchPattern') | gi | % FullName

How to count total number of matches of regular expression per file on AIX

Grep is usually used to display the lines containing a match of the specified pattern. Is there any way in AIX to display the total number of matches of the pattern in each file searched? That is to say, every match in every line should be counted.
I tried grep -c pattern filename, but that only counts each matching line once however many matches it contains.
grep -o foo filename.txt | wc -l
Finding the 3 occurrences of b. in this file:
$ cat file
a bc d be f
bg h
$ awk '{c+=gsub(/b./,"")} END{print c+0}' file
3
The above will work with any awk on any OS (except old, broken awk of course).
You need to match the patterns first, then count the number of matches.
The -o switch will yield each match on a new line.
Then just count the total number of lines.
Something like:
grep -o pattern filename | wc -l

grep in a textfile all lines containing 'xxx' and the previous line

I want to print all lines in a tomcat catalina.out log containing xxx. A simple thing to accomplish using:
cat catalina.out | grep xxx
However. In the logfile I get the lines containing xxx, the line above this line is containing the date and time when the item was logged. I would like to see those lines above the grepped lines too. How could I accomplish this?
grep -B1
-B[n] lets you see [n] lines before the pattern that you are looking for.
You can also use -A for 'lines after', and -C for 'context' (lines both above and below).
You can also simplify your grep call and remove the pipe with grep xxx -B1 catalina.out.

How to truncate long matching lines returned by grep or ack

I want to run ack or grep on HTML files that often have very long lines. I don't want to see very long lines that wrap repeatedly. But I do want to see just that portion of a long line that surrounds a string that matches the regular expression. How can I get this using any combination of Unix tools?
You could use the grep options -oE, possibly in combination with changing your pattern to ".{0,10}<original pattern>.{0,10}" in order to see some context around it:
-o, --only-matching
Show only the part of a matching line that matches PATTERN.
-E, --extended-regexp
Interpret pattern as an extended regular expression (i.e., force grep to behave as egrep).
For example (from #Renaud's comment):
grep -oE ".{0,10}mysearchstring.{0,10}" myfile.txt
Alternatively, you could try -c:
-c, --count
Suppress normal output; instead print a count of matching lines
for each input file. With the -v, --invert-match option (see
below), count non-matching lines.
Pipe your results thru cut. I'm also considering adding a --cut switch so you could say --cut=80 and only get 80 columns.
You could use less as a pager for ack and chop long lines: ack --pager="less -S" This retains the long line but leaves it on one line instead of wrapping. To see more of the line, scroll left/right in less with the arrow keys.
I have the following alias setup for ack to do this:
alias ick='ack -i --pager="less -R -S"'
grep -oE ".\{0,10\}error.\{0,10\}" mylogfile.txt
In the unusual situation where you cannot use -E, use lowercase -e instead.
Explanation:
cut -c 1-100
gets characters from 1 to 100.
The Silver Searcher (ag) supports its natively via the --width NUM option. It will replace the rest of longer lines by [...].
Example (truncate after 120 characters):
$ ag --width 120 '#patternfly'
...
1:{"version":3,"file":"react-icons.js","sources":["../../node_modules/#patternfly/ [...]
In ack3, a similar feature is planned but currently not implemented.
Taken from: http://www.topbug.net/blog/2016/08/18/truncate-long-matching-lines-of-grep-a-solution-that-preserves-color/
The suggested approach ".{0,10}<original pattern>.{0,10}" is perfectly good except for that the highlighting color is often messed up. I've created a script with a similar output but the color is also preserved:
#!/bin/bash
# Usage:
# grepl PATTERN [FILE]
# how many characters around the searching keyword should be shown?
context_length=10
# What is the length of the control character for the color before and after the
# matching string?
# This is mostly determined by the environmental variable GREP_COLORS.
control_length_before=$(($(echo a | grep --color=always a | cut -d a -f '1' | wc -c)-1))
control_length_after=$(($(echo a | grep --color=always a | cut -d a -f '2' | wc -c)-1))
grep -E --color=always "$1" $2 |
grep --color=none -oE \
".{0,$(($control_length_before + $context_length))}$1.{0,$(($control_length_after + $context_length))}"
Assuming the script is saved as grepl, then grepl pattern file_with_long_lines should display the matching lines but with only 10 characters around the matching string.
I put the following into my .bashrc:
grepl() {
$(which grep) --color=always $# | less -RS
}
You can then use grepl on the command line with any arguments that are available for grep. Use the arrow keys to see the tail of longer lines. Use q to quit.
Explanation:
grepl() {: Define a new function that will be available in every (new) bash console.
$(which grep): Get the full path of grep. (Ubuntu defines an alias for grep that is equivalent to grep --color=auto. We don't want that alias but the original grep.)
--color=always: Colorize the output. (--color=auto from the alias won't work since grep detects that the output is put into a pipe and won't color it then.)
$#: Put all arguments given to the grepl function here.
less: Display the lines using less
-R: Show colors
S: Don't break long lines
Here's what I do:
function grep () {
tput rmam;
command grep "$#";
tput smam;
}
In my .bash_profile, I override grep so that it automatically runs tput rmam before and tput smam after, which disabled wrapping and then re-enables it.
ag can also take the regex trick, if you prefer it:
ag --column -o ".{0,20}error.{0,20}"

Resources