grep count all matches in one line of single pattern

grep count all matches in one line of single pattern - grep

How can I count all matches with grep, when there are more than one match per line?
$ cat example.txt
int foo(int a, char b) { int i = 80; return i; }
$ grep -c "int" example.txt
1
I want to output 3 (since int appears 3 times in the file)

With grep that supports -o option:
$ grep -o 'int' ip.txt | wc -l
3
With ripgrep:
$ rg -oc 'int' ip.txt
3

The best is to use awk for this:
$ awk '{c+=gsub(/int/,"")}END{print c}' file
The function gsub is used to perform substitutions and returns the total amount of substitutions done. In the above, we replace the regular expression int with an empty string to do the counting. This is done for every line of file. Per line, we add the count to the variable c which is by default initialized to zero. At the END of the script, we print the value of c.

Consider this awk approach using search word a field separator:
awk -F 'int' '{print NF-1}' file
3

Related

Is it possible to show all lines after match with grep/ripgrep? [duplicate]

Question: I'd like to print a single line directly following a line that contains a matching pattern.
My version of sed will not take the following syntax (it bombs out on +1p) which would seem like a simple solution:
sed -n '/ABC/,+1p' infile
I assume awk would be better to do multiline processing, but I am not sure how to do it.

Never use the word "pattern" in this context as it is ambiguous. Always use "string" or "regexp" (or in shell "globbing pattern"), whichever it is you really mean. See How do I find the text that matches a pattern? for more about that.
The specific answer you want is:
awk 'f{print;f=0} /regexp/{f=1}' file
or specializing the more general solution of the Nth record after a regexp (idiom "c" below):
awk 'c&&!--c; /regexp/{c=1}' file
The following idioms describe how to select a range of records given a specific regexp to match:
a) Print all records from some regexp:
awk '/regexp/{f=1}f' file
b) Print all records after some regexp:
awk 'f;/regexp/{f=1}' file
c) Print the Nth record after some regexp:
awk 'c&&!--c;/regexp/{c=N}' file
d) Print every record except the Nth record after some regexp:
awk 'c&&!--c{next}/regexp/{c=N}1' file
e) Print the N records after some regexp:
awk 'c&&c--;/regexp/{c=N}' file
f) Print every record except the N records after some regexp:
awk 'c&&c--{next}/regexp/{c=N}1' file
g) Print the N records from some regexp:
awk '/regexp/{c=N}c&&c--' file
I changed the variable name from "f" for "found" to "c" for "count" where
appropriate as that's more expressive of what the variable actually IS.
f is short for found. Its a boolean flag that I'm setting to 1 (true) when I find a string matching the regular expression regexp in the input (/regexp/{f=1}). The other place you see f on its own in each script it's being tested as a condition and when true causes awk to execute its default action of printing the current record. So input records only get output after we see regexp and set f to 1/true.
c && c-- { foo } means "if c is non-zero then decrement it and if it's still non-zero then execute foo" so if c starts at 3 then it'll be decremented to 2 and then foo executed, and on the next input line c is now 2 so it'll be decremented to 1 and then foo executed again, and on the next input line c is now 1 so it'll be decremented to 0 but this time foo will not be executed because 0 is a false condition. We do c && c-- instead of just testing for c-- > 0 so we can't run into a case with a huge input file where c hits zero and continues getting decremented so often it wraps around and becomes positive again.

It's the line after that match that you're interesting in, right? In sed, that could be accomplished like so:
sed -n '/ABC/{n;p}' infile
Alternatively, grep's A option might be what you're looking for.
-A NUM, Print NUM lines of trailing context after matching lines.
For example, given the following input file:
foo
bar
baz
bash
bongo
You could use the following:
$ grep -A 1 "bar" file
bar
baz
$ sed -n '/bar/{n;p}' file
baz

I needed to print ALL lines after the pattern ( ok Ed, REGEX ), so I settled on this one:
sed -n '/pattern/,$p' # prints all lines after ( and including ) the pattern
But since I wanted to print all the lines AFTER ( and exclude the pattern )
sed -n '/pattern/,$p' | tail -n+2 # all lines after first occurrence of pattern
I suppose in your case you can add a head -1 at the end
sed -n '/pattern/,$p' | tail -n+2 | head -1 # prints line after pattern
And I really should include tlwhitec's comment in this answer (since their sed-strict approach is the more elegant than my suggestions):
sed '0,/pattern/d'
The above script deletes every line starting with the first and stopping with (and including) the line that matches the pattern. All lines after that are printed.

awk Version:
awk '/regexp/ { getline; print $0; }' filetosearch

If pattern match, copy next line into the pattern buffer, delete a return, then quit -- side effect is to print.
sed '/pattern/ { N; s/.*\n//; q }; d'

Actually sed -n '/pattern/{n;p}' filename will fail if the pattern match continuous lines:
$ seq 15 |sed -n '/1/{n;p}'
2
11
13
15
The expected answers should be:
2
11
12
13
14
15
My solution is:
$ sed -n -r 'x;/_/{x;p;x};x;/pattern/!s/.*//;/pattern/s/.*/_/;h' filename
For example:
$ seq 15 |sed -n -r 'x;/_/{x;p;x};x;/1/!s/.*//;/1/s/.*/_/;h'
2
11
12
13
14
15
Explains:
x;: at the beginning of each line from input, use x command to exchange the contents in pattern space & hold space.
/_/{x;p;x};: if pattern space, which is the hold space actually, contains _ (this is just a indicator indicating if last line matched the pattern or not), then use x to exchange the actual content of current line to pattern space, use p to print current line, and x to recover this operation.
x: recover the contents in pattern space and hold space.
/pattern/!s/.*//: if current line does NOT match pattern, which means we should NOT print the NEXT following line, then use s/.*// command to delete all contents in pattern space.
/pattern/s/.*/_/: if current line matches pattern, which means we should print the NEXT following line, then we need to set a indicator to tell sed to print NEXT line, so use s/.*/_/ to substitute all contents in pattern space to a _(the second command will use it to judge if last line matched the pattern or not).
h: overwrite the hold space with the contents in pattern space; then, the content in hold space is ^_$ which means current line matches the pattern, or ^$, which means current line does NOT match the pattern.
the fifth step and sixth step can NOT exchange, because after s/.*/_/, the pattern space can NOT match /pattern/, so the s/.*// MUST be executed!

This might work for you (GNU sed):
sed -n ':a;/regexp/{n;h;p;x;ba}' file
Use seds grep-like option -n and if the current line contains the required regexp replace the current line with the next, copy that line to the hold space (HS), print the line, swap the pattern space (PS) for the HS and repeat.

Piping some greps can do it (it runs in POSIX shell and under BusyBox):
cat my-file | grep -A1 my-regexp | grep -v -- '--' | grep -v my-regexp
-v will show non-matching lines
-- is printed by grep to separate each match, so we skip that too

If you just want the next line after a pattern, this sed command will work
sed -n -e '/pattern/{n;p;}'
-n supresses output (quiet mode);
-e denotes a sed command (not required in this case);
/pattern/ is a regex search for lines containing the literal combination of the characters pattern (Use /^pattern$/ for line consisting of only of “pattern”;
n replaces the pattern space with the next line;
p prints;
For example:
seq 10 | sed -n -e '/5/{n;p;}'
Note that the above command will print a single line after every line containing pattern. If you just want the first one use sed -n -e '/pattern/{n;p;q;}'. This is also more efficient as the whole file is not read.
This strictly sed command will print all lines after your pattern.
sed -n '/pattern/,${/pattern/!p;}
Formatted as a sed script this would be:
/pattern/,${
/pattern/!p
}
Here’s a short example:
seq 10 | sed -n '/5/,${/5/!p;}'
/pattern/,$ will select all the lines from pattern to the end of the file.
{} groups the next set of commands (c-like block command)
/pattern/!p; prints lines that doesn’t match pattern. Note that the ; is required in early versions, and some non-GNU, of sed. This turns the instruction into a exclusive range - sed ranges are normally inclusive for both start and end of the range.
To exclude the end of range you could do something like this:
sed -n '/pattern/,/endpattern/{/pattern/!{/endpattern/d;p;}}
/pattern/,/endpattern/{
/pattern/!{
/endpattern/d
p
}
}
/endpattern/d is deleted from the “pattern space” and the script restarts from the top, skipping the p command for that line.
Another pithy example:
seq 10 | sed -n '/5/,/8/{/5/!{/8/d;p}}'
If you have GNU sed you can add the debug switch:
seq 5 | sed -n --debug '/2/,/4/{/2/!{/4/d;p}}'
Output:
SED PROGRAM:
/2/,/4/ {
/2/! {
/4/ d
p
}
}
INPUT: 'STDIN' line 1
PATTERN: 1
COMMAND: /2/,/4/ {
COMMAND: }
END-OF-CYCLE:
INPUT: 'STDIN' line 2
PATTERN: 2
COMMAND: /2/,/4/ {
COMMAND: /2/! {
COMMAND: }
COMMAND: }
END-OF-CYCLE:
INPUT: 'STDIN' line 3
PATTERN: 3
COMMAND: /2/,/4/ {
COMMAND: /2/! {
COMMAND: /4/ d
COMMAND: p
3
COMMAND: }
COMMAND: }
END-OF-CYCLE:
INPUT: 'STDIN' line 4
PATTERN: 4
COMMAND: /2/,/4/ {
COMMAND: /2/! {
COMMAND: /4/ d
END-OF-CYCLE:
INPUT: 'STDIN' line 5
PATTERN: 5
COMMAND: /2/,/4/ {
COMMAND: }
END-OF-CYCLE:

remove word using "grep -v" exact match only

How can I do an exact match using grep -v?
For example: the following command
for i in 0 0.3 a; do echo $i | grep -v "0"; done
returns a. But I want it to return 0.3 a.
Using
grep -v "0\b"
is not working

for i in 0 0.3 a; do echo $i | grep -v "^0$"; done
You need to match the start and end of the string with ^ and $
So, we say "match the beginning of a line, the char 0 and then the end of the line.
$ for i in 0 0.3 a; do echo $i | grep -v "^0$"; done
0.3
a

The safest way for single-column entries is using awk. Normally, I would use grep with the -w flag, but since you want to exactly match an integer that could be part of a float, it is a bit more tricky. The <dot>-character makes it hard to use any of
grep -vw 0
grep -v '\b0\b'
grep -v '\<0\>'
The proposed solution also will only work on perfect lines, what if you have a lost space in front or after your zero. The line will fail. So the safest would be:
single column file:
awk '($1!="0")' file
multi-word file: (adopt the variable FS to fit your needs)
awk '{for(i=1;i<=NF;++i) if($i == "0") next}1' file

How do I 'grep -c' and avoid printing files with zero '0' count

The command 'grep -c blah *' lists all the files, like below.
% grep -c jill *
file1:1
file2:0
file3:0
file4:0
file5:0
file6:1
%
What I want is:
% grep -c jill * | grep -v ':0'
file1:1
file6:1
%
Instead of piping and grep'ing the output like above, is there a flag to suppress listing files with 0 counts?
SJ

How to grep nonzero counts:
grep -rIcH 'string' . | grep -v ':0$'
-r Recurse subdirectories.
-I Ignore binary files (thanks #tongpu, warlock).
-c Show count of matches. Annoyingly, includes 0-count files.
-H Show file name, even if only one file (thanks #CraigEstey).
'string' your string goes here.
. Start from the current directory.
| grep -v ':0$' Remove 0-count files. (thanks #LaurentiuRoescu)
(I realize the OP was excluding the pipe trick, but this is what works for me.)

Just use awk. e.g. with GNU awk for ENDFILE:
awk '/jill/{c++} ENDFILE{if (c) print FILENAME":"c; c=0}' *

How to count total number of matches of regular expression per file on AIX

Grep is usually used to display the lines containing a match of the specified pattern. Is there any way in AIX to display the total number of matches of the pattern in each file searched? That is to say, every match in every line should be counted.
I tried grep -c pattern filename, but that only counts each matching line once however many matches it contains.

grep -o foo filename.txt | wc -l

Finding the 3 occurrences of b. in this file:
$ cat file
a bc d be f
bg h
$ awk '{c+=gsub(/b./,"")} END{print c+0}' file
3
The above will work with any awk on any OS (except old, broken awk of course).

You need to match the patterns first, then count the number of matches.
The -o switch will yield each match on a new line.
Then just count the total number of lines.
Something like:
grep -o pattern filename | wc -l

How to grep for a 7 digit hexadecimal string and return only that hexadecimal string?

I am trying to extract all the leading 7 digit hexadecimal strings in a file, that contains lines such as:
3fce110:: ..\Utilities\c\misc.c(431): YESFREED (120 bytes) Misc

egrep -o '^[0-9a-f]{7}\b' file.txt
egrep is the same as grep -E; it uses extended regexp.
-o prints only the matching part of each line.
^ anchors the match to the beginning of the line.
[0-9a-f]{7} matches seven hexadecimal characters. If you want to match uppercase letters add A-F here or add the -i flag.
\b checks for a word boundary; it ensures we don't match hex numbers more than 7 digits long.

If all the lines in the file follow the given format then a couple of methods:
$ grep -o '^[^:]*' file
3fce110
$ awk -F: '{print $1}' file
3fce110
$ cut -d: -f1 file
3fce110
$ sed  's/:.*//' file
3fce110

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

grep count all matches in one line of single pattern - grep

How can I count all matches with grep, when there are more than one match per line? $ cat example.txt int foo(int a, char b) { int i = 80; return i; } $ grep -c "int" example.txt 1 I want to output 3 (since int appears 3 times in the file)

With grep that supports -o option: $ grep -o 'int' ip.txt | wc -l 3 With ripgrep: $ rg -oc 'int' ip.txt 3

Consider this awk approach using search word a field separator: awk -F 'int' '{print NF-1}' file 3

Related

Is it possible to show all lines after match with grep/ripgrep? [duplicate]

remove word using "grep -v" exact match only

How do I 'grep -c' and avoid printing files with zero '0' count

How to count total number of matches of regular expression per file on AIX

How to grep for a 7 digit hexadecimal string and return only that hexadecimal string?

Categories

Resources