How can I find a specific GID within proc stats using grep and PCRE - grep

I am trying to fix a problematic polkit rule that checks to see if the process that is asking for permission is running with specific group ID.
The way its being done in the script (not mine) is using a pcre regex to scrape the GID out of the /proc/####/status.
The original regex is written as such;
"^Groups:.+?\\\s990[\\\s\\\0]";
However when I run this snippet of code using grep -Po "^Groups:.+?\\\s990[\\\s\\\0]" /proc/2457/status I get nothing returned. After messing around with the pattern being matched I was able to determine the reason its not being matched is its expecting the Groups: line to only have one entry. Mine has 2, since the user running the process has 2 groups (990, 992).
This is what the output the regex needs to find the match in looks like.
Name: python
Umask: 0022
State: S (sleeping)
Tgid: 3479
Ngid: 0
Pid: 3479
PPid: 1
TracerPid: 0
Uid: 990 990 990 990
Gid: 990 990 990 990
FDSize: 128
Groups: 990 992
NStgid: 3479
NSpid: 3479
NSpgid: 3479
My problem is I am not sure how to build the right pattern to match. The issue being while on this system the GID 990 is listed first so I could just do a match anything up to the 2nd space character. What do I do about matching the same GID if its somewhere else in the line, say its the second or possibly 3 line.
My perl regex skills are rusty, is there a way to do this easily? I don't think I could do this by tokenizing the match since grep is just providing a true or false reply.
Here is the snippet from the polkit rules I am working with.
var regex = "^Groups:.+?\\\s990[\\\s\\\0]";;
var cmdpath = "/proc/" + subject.pid.toString() + "/status";
try {
polkit.spawn(["grep", "-Po", regex, cmdpath]);
return polkit.Result.YES;
} catch (error) {
return polkit.Result.NOT_HANDLED;
}

There seems to be only digits and spaces following Groups: If you are using pcre you might use:
^Groups:\h[\h\d]*\b990\b
Explanation
^ Start of string
Groups:\h Match Groups: followed by a horizontal whitespace char
[\h\d]* Match optional horizontal whitespace chars or digitgs
\b990\b Match 990 between word boundaries
See a regex101 demo.
Example using grep with -P for Perl-compatible regular expressions:
grep -P "^Groups:\h[\h\d]*\b990\b" file

Any chance you are adding an extra backslash to the regex?
Besides that, you may want to use this regex:
^Groups:.*?\s990\b
See https://regex101.com/r/XkpzGF/1
echo "Groups: 990 991 992" | grep -P "^Groups:.*?\\s990\b"
Groups: 990 991 992
echo "Groups: 991 990 992" | grep -P "^Groups:.*?\\s990\b"
Groups: 991 990 992
echo "Groups: 991 992 990" | grep -P "^Groups:.*?\\s990\b"
Groups: 991 992 990
I think it is simpler to use a bareword at the end instead of the character class.
Also, if you plan to match the space before the s990, the quantifier should be changed from + to *, If not, even if the quantifier is lazy, it will match at least one character, consumpting the space, and thus making the regex unable to match the following mandatory space \s

Related

Linux Grep Command - Extract multiple texts between strings

Context;
After running the following command on my server:
zgrep "ResCode-5005" /loggers1/PCRF*/_01_03_2022 > analisis.txt
I get a text file with thousands of lines like this example:
loggers1/PCRF1_17868/PCRF12_01_03_2022_00_15_39.log:[C]|01-03-2022:00:18:20:183401|140404464875264|TRACKING: CCR processing Compleated for SubId-5281181XXXXX, REQNO-1, REQTYPE-3,
SId-mscp01.herpgwXX.epc.mncXXX.mccXXX.XXXXX.org;25b8510c;621dbaab;3341100102036XX-27cf0XXX,
RATTYPE-1004, ResCode-5005 |processCCR|ProcessingUnit.cpp|423
(X represents incrementing numbers)
Problem:
The output is filled with unnecessary data. The only string portions I need are the MSISDN,IMSI comma separated for each line, like this:
5281181XXXXX,3341100102036XX
Steps I tried
zgrep "ResCode-5005" /loggers1/PCRF*/_01_03_2022| grep -o -P
'(?<=SubId-).*?(?=, REQ)' > analisis1.txt
This gave me the first part of the solution
5281181XXXXX
However, when I tried to get the second string located between '334110' and "-"
zgrep "ResCode-5005" /loggers1/PCRF*/_01_03_2022| grep -o -P
'(?<=SubId-).?(?=, REQ)' | grep -o -P '(?<=334110).?(?=-)' >
analisis1.txt
it doesn't work.
Any input will be appreciated.
To get 5281181XXXXX or the second string located between '334110' and "-" you can use a pattern like:
\b(?:SubId-|334110)\K[^,\s-]+
The pattern matches:
\b A word boundary to prevent a partial word match
(?: Non capture group to match as a whole
SubId- Match literally
| Or
334110 Match literally
) Close the non capture group
\K Forget what is matched so far
[^,\s-]+ Match 1+ occurrences of any char except a whitespace char , or -
See the matches in this regex demo.
That will match:
5281181XXXXX
0102036XX
The command could look like
zgrep "ResCode-5005" /loggers1/PCRF*/_01_03_2022 | grep -oP '\b(?:SubId-|334110)\K[^,\s-]+' > analisis1.txt

Get content inside brackets using grep

I have text that looks like this:
Name (OneData) [113C188D-5F70-44FE-A709-A07A5289B75D] (MoreData)
I want to use grep or some other way to get the ID inside [].
How to do it?
You can do something like this via bash (GNU grep required):
t="Name (OneData) [113C188D-5F70-44FE-A709-A07A5289B75D] (MoreData)"
echo "$t" | grep -Po "(?<=\[).*(?=\])"
The pattern will give you everything between the brackets, and uses a zero-width look-behind assertion (?<= ...) to eliminate the opening bracket and uses a zero-width look-ahead assertion (?= ...) to eliminate the closing bracket.
The -P flag activates perl-style regexes which can be useful not having too much to escape, then. The -o flag will give you only the wanted result (not the "non-capturing groups").
If you don't have GNU grep available, you can solve the problem in two steps (there are probably also other solutions):
Get the ID with the brackets (\[.*\])
Remove the brackets (] and [, here via sed, for example)
echo "$t" | grep -o "\[.*\]" | sed 's/[][]//g'
As Cyrus commented, you can also use the pattern grep -oE '[0-9A-F-]{36}' if you can ensure not having strings of length 36 or larger containing only the characters 0-9, A-F and - and if all the IDs have the length of 36 characters, of course. Then you can simply ignore the brackets.

grep for path in process(ps) containing number

I would like to grep for process path which has a variable. Example -
This is one of the proceses running.
/var/www/vhosts/rcsdfg/psd_folr/rcerr-m-deve-udf-172/bin/magt queue:consumers:start customer.import_proditns --single-thread --max-messages=1000
I would like to grep for "psd_folr/rcerr-m-deve-udf-172/bin/magt queue" from the running processes.
The catch is that the number 172 keeps changing, but it will be a 3 digit number only. Please suggest, I tried below but it is not returning any output.
sudo ps axu | grep "psd_folr/rcerr-m-deve-udf-'^[0-9]$'/bin/magt queue"
The most relevant section of your regular expression is -'^[0-9]$'/ which has following problems:
the apostrophes have no syntactical meaning to grep other than read an apostrophe
the caret ^ matches the beginning of a line, but there is no beginning of a line in ps's output at this place
the dollar $ matches the end of a line, but there is no end of a line in ps's output at this place
you want to read 3 digits but [0-9] will only match a single one
Thus, the part of your expression should be modified like this -[0-9]+/ to match any number of digits (+ matches the preceding character any number of times but at least once) or like this -[0-9]{3}/ to match exactly three times ({n} matches the preceding character exactly n times).
If you alter your command, give grep the -E flag so it uses extended regular expressions, otherwise you need to escape the plus or the braces:
sudo ps axu | grep -E "psd_folr/rcerr-m-deve-udf-[0-9]+/bin/magt queue"

dot in grep command being used as regex

I'm trying to understand if bash is doing something with the string before passing it to grep or if grep uses basic regex searching by default. The man page and other answers don't really clarify
ss -an | grep "8.02"
u_dgr UNCONN 0 0 * 820284002 * 820284001
u_str ESTAB 0 0 * 820283949 * 820287456
It looks like the . is being used in a regex fashion to match a single char. However, I would only expect this to happen when using grep -e or grep -E. If bash was intercepting the string I would expect special shell chars to be intercepted first such as * or ?.
The man entry states I am using GNU grep 3.1
Looks like I have immediately found the answer after RTFMing a little closer
-G, --basic-regexp
Interpret PATTERN as a basic regular expression (BRE, see below). This is the default.
"This is the default" - I assume means this is the default behaviour if no flags are passed?

Grep's word boundaries include spaces?

I tried to use grep to search for lines containing the word "bead" using "\b" but it doesn't find the lines containing the word "bead" separated by space. I tried this script:
cat in.txt | grep -i "\bbead\b" > out.txt
I get results like
BEAD-air.JPG
Bead, 3 sided MET DP110317.jpg
Bead. -2819 (FindID 10143).jpg
Bead(Gem), Artefacts of Phu Hoa site(Dong Nai province).jpg
Romano-British pendant amulet (bead) (FindID 241983).jpg
But I don't get the results like
Bead fun.jpg
Instead of getting some 2,000 lines, I'm only getting 92 lines
My OS is Windows 10 - 64 bit but I'm using grep 2.5.4 from the GnuWin32 package.
I've also tried the MSYS2, which includes grep 3.0 but it does the same thing.
And then, how can I search for words separated by space?
LATER EDIT:
It looks like grep has problems with big files. My input file is 2.4 GB in size. With smaller files, it works - I reported the bug here: https://sourceforge.net/p/getgnuwin32/discussion/554300/thread/03a84e6b/
Try this,
cat in.txt | grep -wi "bead"
-w provides you a whole word search
What you are doing normally should work but there are ways of setting what is and is not considered a word boundary. Rather than worry about it please try this instead:
cat in.txt | grep -iP "\bbead(\b|\s)" > out.txt
The P option adds in Perl regular expression power and the \s matches any sort of space character. The Or Bar | separates options within the parens ( )
While you are waiting for grep to be fixed you could use another tool if it is available to you. E.g.
perl -lane 'print if (m/\bbead\b/i);' in.txt > out.txt

Resources