Grep from text file over the past hour - grep

I have several commands similar to:
ping -i 60 8.8.8.8 | while read pong; do echo "$(date): $pong" >> /security/latencytracking/pingcapturetest2.txt; done
output:
Tue Feb 4 15:13:39 EST 2014: 64 bytes from 8.8.8.8: icmp_seq=0 ttl=50
time=88.844 ms
I then search the results using:
cat /security/latencytracking/pingcapturetest* | egrep 'time=........ ms|time=......... ms'
I am looking for latency anomalies over X ms.
Is there a way to search better than I am doing and search over the past 1,2,3, etc. hours as opposed to from the start of the file? This could get tedious over time.

You could add unix timestamp to your log, and then search based on that:
ping -i 60 8.8.8.8 | while read pong; do
echo "$(date +"%s"): $pong" >> log.txt
done
Your log will have entries like:
1391548048: 64 bytes from 8.8.8.8: icmp_req=1 ttl=47 time=20.0 ms
Then search with a combination of date and awk:
Using GNU Date (Linux etc):
awk -F: "\$1 > $(date -d '1 hour ago' +'%s')" log.txt
or BSD Date (Mac OSX, BSD)
awk -F: "\$1 > $(date -j -v '-1H' +%s)" log.txt
The command uses date -d to translate english time-sentence (or date -v for the same task on BSD/OSX) to unix timestamp. awk then compares the logged timestamp (first field before the :) with the generated timestamp and prints all log-lines which have a higher value, ie newer.

If you are familiar with R:
1. I'd slurp the whole thing in with read.table(), drop the unnecessary columns
2. then do whatever calculations you like
Unless you have tens of millions of records, then R might be a bit slow.
Plan B:
1. use cut to nuke anything you dont need and then goto the plan above.

You can also do it with bash. You can compare dates, as follows:
Crop the date field. You can convert that date into the number of seconds since midnight of 1st Jan 1970
date -d "Tue Feb 4 15:13:39 EST 2014" '+%s'
you compare that number against the number of seconds you got one hour ago,
reference=$(date --date='-1 hour' '+%s')
This way you get all records from last hour. Then you can filter after the length of the delay

Related

How do I grep starting at a specific point in a log file?

A have daily syslog files which contain syslog messages in the format: MMM DD HH:MM:SS additional_data_here
We make a change, then want to see if the syslog messages continue.
For example, say a change was made at 09:55. I want to ignore everything prior to the first line that contains Oct 29 09:55:00. Then I want to grep for my error message after that first line match.
For this example, I have to create several different statements, like this:
grep -e "Oct 29 09:5[5-9]" syslog20211029 | grep "[my message]"
grep -e "Oct 29 1[0-1]:" syslog20211029 | grep "[my message]"
But I do this often enough that I'd like to find a better, more consistent way. Something like:
start-at-first-match "Oct 29 09:55:00" syslog20211029 | grep "[my message]"
But I don't know what the start-at-first-match option is. Any suggestions?
If you want to restrict yourself to using grep, you can't really but with the option -A num it can still meet your need (giving a big number for num) :
grep -A 10000000 "Oct 29 09:55:00" syslog20211029
This will print the matching line and the next 10 million.
If you want everything that follows the line for sure (without having to give an "unreachable" number of lines), you have to use another command (like sed or awk). Using sed: sed -n '/Oct 29 09:55:00/,$ p' (with -n won't print the lines by default, and from the line you want, between /pattern/, to the end of file $ you ask sed to print the lines).

Extract specific number from command outout

I have the following issue.
In a script, I have to execute the hdparm command on /dev/xvda1 path.
From the command output, I have to extract the MB/sec values calculated.
So, for example, if executing the command I have this output:
/dev/xvda1:
Timing cached reads: 15900 MB in 1.99 seconds = 7986.93 MB/sec
Timing buffered disk reads: 478 MB in 3.00 seconds = 159.09 MB/sec
I have to extract 7986.93 and 159.09.
I tried:
grep -o -E '[0-9]+', but it returns to me all the six number in the output
grep -o -E '[0-9]', but it return to me only the first character of the six values.
grep -o -E '[0-9]+$', but the output is empty, I suppose because the number is not the last character set of outoput.
How can I achieve my purpose?
To get the last number, you can add a .* in front, that will match as much as possible, eating away all the other numbers. However, to exclude that part from the output, you need GNU grep or pcregrep or sed.
grep -Po '.* \K[0-9.]+'
Or
sed -En 's/.* ([0-9.]+).*/\1/p'
Consider using awk to just print the fields you want rather than matching on numbers. This will work using any awk in any shell on every Unix box:
$ hdparm whatever | awk 'NF>1{print $(NF-1)}'
7986.93
159.09

Bash script print the word after the match

I'm trying to check if time clock is synchronized via bash script.
So first i have executed the command timedatectl which outputs the following:
Local time: Mi 2021-03-17 12:52:53 CET
Universal time: Mi 2021-03-17 11:52:53 UTC
RTC time: Mi 2021-03-17 11:52:53
Time zone: Europe/Berlin (CET, +0100)
System clock synchronized: yes
systemd-timesyncd.service active: yes
RTC in local TZ: no
So the goal is to get the value of System clock synchronized which is yes and assign it to a variable and print it. therefore i did the follwing:
1-# check if time synchronized
2-syncTime="$(timedatectl | grep 'System clock')" # returns the whole line
3-echo "$syncTime"# make sure the line is saved
4-ifTimeSynched="$( $syncTime | grep -oP 'synchronized: \K\w+')" # this supposes to save the word
after synchronized:
echo "$ifTimeSynched"
Line 4 should get the word next to the word synchronized: which is in this case the yesHowever, when i print $ifTimeSynched as shown above, it returns and empty line.
Any reason why it doesn't catch the word yes?
thank in advance
This could be easily done with using awk. You need to search for string System clock synchronized in output of your command timedatectl(which is passed as an input to awk command) and print desired column(field) which is last field in your case.
ifTimeSynched=$(timedatectl | awk '/System clock synchronized/{print $NF}')
echo "$ifTimeSynched"
try:
ifTimeSynched=$( echo $syncTime | grep -oP 'synchronized: \K\w+')

GREP to columns along with comma seperation

Im greping a bunch of files in a directory as below
grep -EIho 'abc|def' *|sort|uniq -c >>counts.csv
My output is
150 abc
130 def
What I need is Current date (-1) and the result of grep like below to be inserted to counts.csv
5/21/2018 150,130
grep..|sort|uniq -c
|awk -v d="$(date -d '1 day ago' +%D)" 'NR==1{printf "%s",d}{printf "%s",","$1;}END{print ""}'
will do it.
With your example data, it gives:
05/21/18,150,130

efficient way to parse vmstat output

I'm trying to efficiently parse vmstat output preferably in awk or sed, it also should work on both linux and hp-ux. For example I would like to cut cpu idle % ("92" in this case) from the following output:
$ vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
11 0 385372 101696 61704 650716 0 1 5 9 6 12 5 2 92 0
unfortunately the vmstat output can differ on different linux distributions and hp-ux, also columns can vary in length and can be presented in other order.
I tried to write some nice awk oneliner, but eventually ended with python solution:
$ vmstat | python -c 'import sys; print dict(zip(*map(str.split, sys.stdin)[-2:])).get("id")'
92
Do you know better way to parse mentioned output, to get number values of desired column name?
using awk you can do:
vmstat | awk '(NR==2){for(i=1;i<=NF;i++)if($i=="id"){getline; print $i}}'
This should get value of "id" column on Linux as well as on HP-UX or any other standard unix system.
Tested on Linux, HP-UX and Solaris.
$ vmstat | python -c 'import sys; print sys.stdin.readlines()[-1].split()[-2]'
95

Resources