Linux Grep Command - Extract multiple texts between strings - grep

After running the following command on my server:
zgrep "ResCode-5005" /loggers1/PCRF*/_01_03_2022 > analisis.txt
I get a text file with thousands of lines like this example:
loggers1/PCRF1_17868/PCRF12_01_03_2022_00_15_39.log:[C]|01-03-2022:00:18:20:183401|140404464875264|TRACKING: CCR processing Compleated for SubId-5281181XXXXX, REQNO-1, REQTYPE-3,;25b8510c;621dbaab;3341100102036XX-27cf0XXX,
RATTYPE-1004, ResCode-5005 |processCCR|ProcessingUnit.cpp|423
(X represents incrementing numbers)
The output is filled with unnecessary data. The only string portions I need are the MSISDN,IMSI comma separated for each line, like this:
Steps I tried
zgrep "ResCode-5005" /loggers1/PCRF*/_01_03_2022| grep -o -P
'(?<=SubId-).*?(?=, REQ)' > analisis1.txt
This gave me the first part of the solution
However, when I tried to get the second string located between '334110' and "-"
zgrep "ResCode-5005" /loggers1/PCRF*/_01_03_2022| grep -o -P
'(?<=SubId-).?(?=, REQ)' | grep -o -P '(?<=334110).?(?=-)' >
it doesn't work.
Any input will be appreciated.

To get 5281181XXXXX or the second string located between '334110' and "-" you can use a pattern like:
The pattern matches:
\b A word boundary to prevent a partial word match
(?: Non capture group to match as a whole
SubId- Match literally
| Or
334110 Match literally
) Close the non capture group
\K Forget what is matched so far
[^,\s-]+ Match 1+ occurrences of any char except a whitespace char , or -
See the matches in this regex demo.
That will match:
The command could look like
zgrep "ResCode-5005" /loggers1/PCRF*/_01_03_2022 | grep -oP '\b(?:SubId-|334110)\K[^,\s-]+' > analisis1.txt


Get content inside brackets using grep

I have text that looks like this:
Name (OneData) [113C188D-5F70-44FE-A709-A07A5289B75D] (MoreData)
I want to use grep or some other way to get the ID inside [].
How to do it?
You can do something like this via bash (GNU grep required):
t="Name (OneData) [113C188D-5F70-44FE-A709-A07A5289B75D] (MoreData)"
echo "$t" | grep -Po "(?<=\[).*(?=\])"
The pattern will give you everything between the brackets, and uses a zero-width look-behind assertion (?<= ...) to eliminate the opening bracket and uses a zero-width look-ahead assertion (?= ...) to eliminate the closing bracket.
The -P flag activates perl-style regexes which can be useful not having too much to escape, then. The -o flag will give you only the wanted result (not the "non-capturing groups").
If you don't have GNU grep available, you can solve the problem in two steps (there are probably also other solutions):
Get the ID with the brackets (\[.*\])
Remove the brackets (] and [, here via sed, for example)
echo "$t" | grep -o "\[.*\]" | sed 's/[][]//g'
As Cyrus commented, you can also use the pattern grep -oE '[0-9A-F-]{36}' if you can ensure not having strings of length 36 or larger containing only the characters 0-9, A-F and - and if all the IDs have the length of 36 characters, of course. Then you can simply ignore the brackets.

grep for path in process(ps) containing number

I would like to grep for process path which has a variable. Example -
This is one of the proceses running.
/var/www/vhosts/rcsdfg/psd_folr/rcerr-m-deve-udf-172/bin/magt queue:consumers:start customer.import_proditns --single-thread --max-messages=1000
I would like to grep for "psd_folr/rcerr-m-deve-udf-172/bin/magt queue" from the running processes.
The catch is that the number 172 keeps changing, but it will be a 3 digit number only. Please suggest, I tried below but it is not returning any output.
sudo ps axu | grep "psd_folr/rcerr-m-deve-udf-'^[0-9]$'/bin/magt queue"
The most relevant section of your regular expression is -'^[0-9]$'/ which has following problems:
the apostrophes have no syntactical meaning to grep other than read an apostrophe
the caret ^ matches the beginning of a line, but there is no beginning of a line in ps's output at this place
the dollar $ matches the end of a line, but there is no end of a line in ps's output at this place
you want to read 3 digits but [0-9] will only match a single one
Thus, the part of your expression should be modified like this -[0-9]+/ to match any number of digits (+ matches the preceding character any number of times but at least once) or like this -[0-9]{3}/ to match exactly three times ({n} matches the preceding character exactly n times).
If you alter your command, give grep the -E flag so it uses extended regular expressions, otherwise you need to escape the plus or the braces:
sudo ps axu | grep -E "psd_folr/rcerr-m-deve-udf-[0-9]+/bin/magt queue"

Match Lines From Two Lists With Wildcards In One List

I have two lists, one of which contains wildcards (in this case represented by *). I would like to compare the two lists and create an output of those that match, with each wildcard * representing a single character.
For example:
File 1
File 2
The first two lines are not considered matches because the number of *s is not equal to the number of characters shown in the first file. The latter two are, so they are added to output.
I have tried to reason out ways to do this in AWK and using Join, but I don't know any way to even start trying to achieve this. Any help would be greatly appreciated.
$ cat tst.awk
# Make every non-* char literal (see
gsub(/[^^*]/,"[&]") # Convert every char X to [X] except ^ and *
gsub(/\^/,"\\^") # Convert every ^ to \^
# Convert every * to .:
# Add line start/end anchors
$0 = "^" $0 "$"
# See if the current file2 line matches any line from file1
# and if so print that line from file1:
for ( line in file1 ) {
if ( line ~ $0 ) {
print line
$ awk -f tst.awk file1 file2
sed 's/\./\\./g; s/\*/./g' file2 | xargs -I{} grep {} file1
I'd take advantage of regular expression matching. To do that, we need to turn every asterisk * into a dot ., which represents any character in regular expressions. As a side effect of enabling regular expressions, we need to escape all special characters, particularly the ., in order for them to be taken literally. In a regular expression, we need to use \. to represent a dot (as opposed to any character).
The first step is perform these substitutions with sed, the second is passing every resulting line as a search pattern to grep, and search file1 for that pattern. The glue that allows to do this is xargs, where a {} is a placeholder representing a single line from the results of the sed command.
This is not a general, safe solution you can simply copy and paste: you should watch out for any characters, in your file containing the asterisks, that are considered special in grep regular expressions.
jhnc extends the escaping to any of the following characters: .\^$[], thus accounting for almost all sorts of email addresses. He/she then avoids the use of xargs by employing -f - to pass the results of sed as search expressions to grep:
sed 's/[.\\^$[]/\\&/g; s/[*]/./g' file2 | grep -f - file1
This solution is both more general and more efficient, see comment below.

Grep: First word in line that begins with ? and ends with?

I'm trying to do a grep command that finds all lines in a file whos first word begins "as" and whos first word also ends with "ng"
How would I go about doing this using grep?
This should just about do it:
$ grep '^as\w*ng\b' file
^ # Matches start of the line
as # Matches literal string as
\w # Matches characters in word class
* # Quantifies \w to match either zero or more
ng # Matches literal string ng
\b # Matches word boundary
May have missed the odd corner case.
If you only want to print the words that match and not the whole lines then use the -o option:
$ grep -o '^as\w*ng\b' file
Read man grep for all information on the available options.
I am pretty sure this should work:
grep "^as[a-zA-Z]*ng\b" <filename>
hard to say without seeing samples from the actual input file.
sudo has already covered it well, but I wanted to throw out one more simple one:
grep -i '^as[^ ]*ng\b' <file>
-i to make grep case-insensitive
[^ ]* matches zero or more of any character, except a space
^ finds the 'first character in a line', so you can search for that with:
grep '^as' [file]
\w matches a word character, so \w* would match any number of word characters:
grep '^as\w*' [file]
\b means 'a boundary between a word and whitespace' which you can use to ensure that you're matching the 'ng' letters at the end of the word, instead of just somewhere in the middle:
grep '^as\w*ng\b' [file]
If you choose to omit the [file], simply pipe your files into it:
cat [file] | grep '^as\w*ng\b'
echo [some text here] | grep '^as\w*ng\b'
Is that what you're looking for?

How to filter using grep on a selected word

grep (GNU grep) 2.14
I have a log file that I want to filter on a selected word. However, it tends to filter on many for example.
tail -f gateway-* | grep "P_SIP:N_iptB1T1"
This will also find words like this:
However, I don't want to display anything after the 1. grep is picking up 11, 12, 13, etc.
Many thanks for any suggestions,
You can restrict the word to end at 1:
tail -f gateway-* | grep "P_SIP:N_iptB1T1\>"
This will work assuming that you have a matching case which is only "P_SIP:N_iptB1T1".
But if you want to extract from P_SIP:N_iptB1T1x, and display only once, then you need to restrict to show only first match.
grep -o "P_SIP:N_iptB1T1"
-o, --only-matching show only the part of a line matching PATTERN
More info
At least two approaches can be tried:
grep -w pattern matches for full words. Seems to work for this case too, even though the pattern has punctuation.
grep pattern -m 1 to restrict the output to first match. (Also doable with grep xxx | head -1)
If the lines contains the quotes as in your example, just use the -E option in grep and match the closing quote with \". For example:
grep -E "P_SIP:N_iptB1T1\"" file
If these quotes aren't in the text file, and there's blank spaces or endlines after the word, you can match these too:
# The word is followed by one or more blanks
grep -E "P_SIP:N_iptB1T1\s+" file
# Match lines ending with the interesting word
grep -E "P_SIP:N_iptB1T1$" file
