Grep content out of txt file with regex - grep

I have a list in a txt file which looks like this.
10.9.0.18,tom,34.0.1.2:44395,Thu Apr 18 07:14:20 2019
10.9.0.10,jonas,84.32.45.2:44016,Thu Apr 18 07:16:06 2019
10.9.0.6,philip,23.56.222.3:55202,Thu Apr 18 07:16:06 2019
10.9.0.26,coolguy,12.34.56.7:53316,Thu Apr 18 07:16:06 2019
I would like to have a script which provides me with the following output:
tom jonas philip coolguy
I've been looking into something like this:
grep -oP "^10.9.0.*,wq$1-\K.*" | cut -d, -f1 | sort
But I am not quite getting there, getting no output at all.

Extract second field
Replace newlines with spaces
cat <<EOF |
10.9.0.18,tom,34.0.1.2:44395,Thu Apr 18 07:14:20 2019
10.9.0.10,jonas,84.32.45.2:44016,Thu Apr 18 07:16:06 2019
10.9.0.6,philip,23.56.222.3:55202,Thu Apr 18 07:16:06 2019
10.9.0.26,coolguy,12.34.56.7:53316,Thu Apr 18 07:16:06 2019
EOF
cut -d, -f2 | tr '\n' ' '

You're not getting output because grep doesn't return anything (you don't need perl regex for this).
You'll need to select the second field too:
grep '^10\.9\.0\.' data.txt | cut -d, -f

If awk is an option you could try:
awk -F, '{printf "%s ", $2} END {print ""}' file.txt
The {printf "%s ", $2 prevents using the default new line and instead uses a space.
The END {print ""} is to add a new line after finishing

This is the right answer if you want multi-line output:
$ awk -F, '/^10\.9\.0/{print $2}' file
tom
jonas
philip
coolguy
or this for single:
$ awk -F, '/^10\.9\.0/{o=o s $2; s=OFS} END{print o}' file
tom jonas philip coolguy
You need to escape the .s as they represent any character in a regexp and you don't need to add a .* at the end of the regexp as that'll literally match "something or nothing".

Related

How do I grep starting at a specific point in a log file?

A have daily syslog files which contain syslog messages in the format: MMM DD HH:MM:SS additional_data_here
We make a change, then want to see if the syslog messages continue.
For example, say a change was made at 09:55. I want to ignore everything prior to the first line that contains Oct 29 09:55:00. Then I want to grep for my error message after that first line match.
For this example, I have to create several different statements, like this:
grep -e "Oct 29 09:5[5-9]" syslog20211029 | grep "[my message]"
grep -e "Oct 29 1[0-1]:" syslog20211029 | grep "[my message]"
But I do this often enough that I'd like to find a better, more consistent way. Something like:
start-at-first-match "Oct 29 09:55:00" syslog20211029 | grep "[my message]"
But I don't know what the start-at-first-match option is. Any suggestions?
If you want to restrict yourself to using grep, you can't really but with the option -A num it can still meet your need (giving a big number for num) :
grep -A 10000000 "Oct 29 09:55:00" syslog20211029
This will print the matching line and the next 10 million.
If you want everything that follows the line for sure (without having to give an "unreachable" number of lines), you have to use another command (like sed or awk). Using sed: sed -n '/Oct 29 09:55:00/,$ p' (with -n won't print the lines by default, and from the line you want, between /pattern/, to the end of file $ you ask sed to print the lines).

how to exclude some of the matches from grep?

I am using grep to printout the matching lines from a very large file
from which i got hundreds of matches, some of them are not interesting i want to exclude those matching which are not interesting
grep "WARNING" | grep -v "WARNING_HANDLING_THREAD" path # i tried this
When I grep the file for warning I get
0-00:00:33.392 (2127:127:250:02 = 21.278532 Fri Feb 1 10:17:22 2019) <3:0x000a>:[89]:[enter]: cest_handleFreeReq.c:116: [WARNING]: cest_handleFreeReq: sent from DECA ->UCS
0-00:00:38.263 (2189:022:166:06 = 21.891510 Fri Feb 1 10:17:28 2019) <3:0x000a>:[89]:[enter]: cest_handleConfigReq.c:176: [WARNING]: cest_handleConfigReq.c: GroupConfig NOT present.
0-00:00:38.263 (2189:022:167:03 = 21.891510 Fri Feb 1 10:17:28 2019) <3:0x000a>:[89]:[enter]: cest_handleConfigReq.c:194: [WARNING]: cest_handleConfigReq: physicalConfig NOT present.
60 0x6d77 0 0x504ea | 2 18 | 0 0 | 4 12 | 647 | 14685 0 0.0 0 500 500 | 0 | 0 | 38 | ETH_DRV_WARNING_HANDLING_thread
60 0 | 0 0 | 0 0 0 | 0 0 0 0 0 0 ! N/A N/A N/A N/A N/A N/A |ETH_DRV_WARNING_HANDLING_thread
WARNING: List of threads violating the heap & stack limit
I want to exclude the last lines which are not interesting
0-00:00:33.392 (2127:127:250:02 = 21.278532 Fri Feb 1 10:17:22 2019) <3:0x000a>:[89]:[enter]: cest_handleFreeReq.c:116: [WARNING]: cest_handleFreeReq: sent from DECA ->UCS
0-00:00:38.263 (2189:022:166:06 = 21.891510 Fri Feb 1 10:17:28 2019) <3:0x000a>:[89]:[enter]: cest_handleConfigReq.c:176: [WARNING]: cest_handleConfigReq.c: GroupConfig NOT present.
0-00:00:38.263 (2189:022:167:03 = 21.891510 Fri Feb 1 10:17:28 2019) <3:0x000a>:[89]:[enter]: cest_handleConfigReq.c:194: [WARNING]: cest_handleConfigReq: physicalConfig NOT present.
Is there a way to do this using grep find or any other tool?
Thank you
Note that the substring thread is in lower case in the data, but in upper case in your expression.
Instead, use
grep -F 'WARNING' logfile | grep -F -v 'WARNING_HANDLING_thread'
The -F make grep use string comparisons rather than regular expression matching (this is not really related to your current issue, but just a way of showing that we know what type of pattern we're matching with).
Another option would be to make the second grep do case insensitive matching with -i:
grep -F 'WARNING' logfile | grep -Fi -v 'WARNING_HANDLING_THREAD'
In this case though, I would probably match the [WARNING] tag instead:
grep -F '[WARNING]:' logfile
Note that here we need the -F so that grep interprets the pattern as a string and not as a regular expression matching any single character out of the W, A, R, N, I, G set, followed by a :.

How do I grab a specific section of a stdout?

I am trying to grab the sda# of a drive that was just inserted.
tail -f /var/log/messages | grep sda:
Returns: Mar 12 17:21:55 raspberrypi kernel: [ 1133.736632] sda: sda1
I would like to grab the sda1 part of the stdout, how would I do that?
I suggest to use this with GNU grep:
| grep -Po 'sd[a-z]+: \Ksd[a-z0-9]+$'
\K: This sequence resets the starting point of the reported match. Any previously matched characters are not included in the final matched sequence.
See: The Stack Overflow Regular Expressions FAQ

Grep: Capture just number

I am trying to use grep to just capture a number in a string but I am having difficulty.
echo "There are <strong>54</strong> cities | grep -o "([0-9]+)"
How am I suppose to just have it return "54"? I have tried the above grep command and it doesn't work.
echo "You have <strong>54</strong>" | grep -o '[0-9]' seems to sort of work but it prints
5
4
instead of 54
Don't parse HTML with regex, use a proper parser :
$ echo "There are <strong>54</strong> cities " |
xmllint --html --xpath '//strong/text()' -
OUTPUT:
54
Check RegEx match open tags except XHTML self-contained tags
You need to use the "E" option for extended regex support (or use egrep). On my Mac OSX:
$ echo "There are <strong>54</strong> cities" | grep -Eo "[0-9]+"
54
You also need to think if there are going to be more than one occurrence of numbers in the line. What should be the behavior then?
EDIT 1: since you have now specified the requirement to be a number between <strong> tags, I would recommend using sed. On my platform, grep does not have the "P" option for perl style regexes. On my other box, the version of grep specifies that this is an experimental feature so I would go with sed in this case.
$ echo "There are <strong>54</strong> 12 cities" | sed -rn 's/^.*<strong>\s*([0-9]+)\s*<\/strong>.*$/\1/p'
54
Here "r" is for extended regex.
EDIT 2: If you have the "PCRE" option in your version of grep, you could also utilize the following with positive lookbehinds and lookaheads.
$ echo "There are <strong>54 </strong> 12 cities" | grep -o -P "(?<=<strong>)\s*([0-9]+)\s*(?=<\/strong>)"
54
RegEx Demo

How to parse text file and retrieve only selected text?

In my log file I have the text in the following format:
18 Mar 2001 14:18:17,438 INFO DomainName1\EmpId1#Admin#3.1
18 Mar 2001 14:19:00,872 INFO DomainName2\EmpId2#User#1.3.2.0
18 Mar 2001 14:20:05,418 INFO DomainName3\EmpId3#Admin#4.3.1.0
I just want to get only the EmpId's.
What about something like
cat logfile | cut -d '#' -f 1 | cut -d '\' -f 2
(This assumes that you are on a Unix-like system, and also assumes that '#' and '\' won't pop up elsewhere than where you put them in your example.)

Resources