Grep for matching pattern but exclude particular string - grep

I've file containing various log lines, and I want to grep for a pattern but exclude when message:com.mycompany.excluded , so basically following should be returned :
"The log found this message blah, message:com.blahblah"
"The log found this message blah2, message:com.blahblah2"
"The log found this message foobar, message:com.mycompany"
"The log found this message blah, message:com.mycompany.included"
but not :
"The log found this message blah, message:com.mycompany.excluded"
I'm using this pattern, but it would not work for excluding com.mycompany.excluded
grep "The log found this message.*message:.*" "mylogs.txt"

awk '/The log found this message.*message:/ && !/com.mycompany.excluded/' "mylogs.txt"
There's various ways to make it more robust but they're probably not necessary (depending on what the rest of your log file looks like).

What you want here is sed, not grep:
sed -i .bak 's/^.*excluded//' mylogs.txt
(The -i .bak creates a backup of your original file with the extension .bak; this would leave you with mylogs.txt and mylogs.txt.bak, with mylogs.txt having the undesired lines removed.)

Related

I want to count detected strings in file with grep, excluding a specific phrase

Hi I'm trying to search a log file for the following words and assign the number of matches to a variable as a number.
errors
error
fail
failure
can't
But I don't want to match the words error or errors if its preceeded by "No "
So ignore error if its "no error" and ignore errors if its "no errors"
Here's what I have so far
ErrorCheck=$(grep -vi "No errors" $LOGFILE | grep -ciE "error|fail|can't" $LOGFILE)
Its not working out for me unfortunatly, any suggestions would be great.
PS I'm using microcore running busybox shell, so I have a slightly lean environment to work in.
All comments and suggestions welcome.
Thanks for your input.
Well, I think one problem is that the "-v" flag for grep will basically omit the entire line containing 'No errors' or any other string you specify. So if a single line contains both "no failures" and "can't" for example, you'd have a problem.
One possible (sort of janky) way to do this could be to store values in three different variables: NUM_COUNTED minus NUM_NOTS = ErrorCheck, which should account for having a "no failure" and an actual failure indicator in the same line.
NUM_COUNTED=grep -ciF 'error
fail
can't' $LOGFILE
NUM_NOTS=grep -ciF 'no error
no fail' $LOGFILE
ErrorCheck=`expr $NUM_COUNTED - $NUM_NOTS`
Alternatively, this seems to be giving (mostly accurate) results:
ErrorCheck=$(grep -vi 'No errors' $LOGFILE | grep -ciF 'error
fail
can't' $LOGFILE)
The -F flag just tells grep to look for string literals (error, fail, and can't) which are newline separated.
Hope this helps.

PigLatin and print a message

I'm using a grunt shell of PIGLATIN and I am trying to print a simple message, like in shell ECHO "result is :" and then result given by Pig script .
However I have done all the searches and no luck so far.
Echo returns error , same as print.
I can't use UDFs...
You can DUMP the alias or STORE the alias in to file to see the alias values.
Refer :
http://chimera.labs.oreilly.com/books/1234000001811/ch05.html#pl_dump
http://chimera.labs.oreilly.com/books/1234000001811/ch05.html#pl_store

Search string occurrence and display directory wise count

We have a error log directory structure wherein we store all errors log files for a particular day in datewise directories -
errorbackup/20150629/errorlogFile3453123.log.xml
errorbackup/20150629/errorlogFile5676934.log.xml
errorbackup/20150629/errorlogFile9812387.log.xml
errorbackup/20150628/errorlogFile1097172.log.xml
errorbackup/20150628/errorlogFile1908071_log.xml
errorbackup/20150627/errorlogFile5675733.log.xml
errorbackup/20150627/errorlogFile9452344.log.xml
errorbackup/20150626/errorlogFile6363446.log.xml
I want to search for a particular string in the error log file and get the output such that I will get directory wise search result of a count of that string's occurrence. For example grep "blahblahSQLError" should output something like-
20150629:0
20150628:0
20150627:1
20150626:1
This is needed because we fixed some errors in one of the release and I want to make sure that there are no occurrences of that error since the day it was deployed to Prod. Also note that there are thousands of error log files created every day. Each error log file is created with a random number in its name to ensure uniqueness.
If you are sure the filenames of the log files will not contain any "odd" characters or newlines then something like the following should work.
for dir in errorbackup/*; do
printf '%s:%s\n' "${dir#*/}" "$(grep -l blahblahSQLError "$dir/"*.xml | wc -l)"
done
If they can have unexpected names then you would need to use multiple calls to grep and count the matching files manually I believe. Something like this.
for dir in errorbackup/*; do
_dcount=0;
for log in "$dir"/*.xml; do
grep -l blahblahSQLError "$log" && _dcount=$((_dcount + 1));
done
done
Something like this should do it:
for dir in errorbackup/*
do
awk -v dir="${dir##*/}" -v OFS=':' '/blahblahSQLError/{c++} END{print dir, c+0}' "$dir"/*
done
There's probably a cuter way to do it with find and xargs to avoid the loop and you could certainly do it all within one awk command but life's too short....

Parse through text file and write out data

I'm working on the first steps towards creating a powershell script that will read through printer logs (probably using get-WMI cmdlet), and parse through the logs. Afterwards, I plan on having the script output to a .txt file the name of the printer, a counter of the number of times a printer was used (if possible), and specific info found in the logs.
In order to do this, I've decided to try working backwards. Below is a small portion of what the logs will look like:
10 Document 81, A361058/GPR0000151814_1: owned by A361058 was printed on R3556 via port IP_***.***.***.***. Size in bytes: 53704; pages printed: 2 20130219123105.000000-300
10 Document 80, A361058/GPR0000151802_1: owned by A361058 was printed on R3556 via port IP_***.***.***.***. Size in bytes: 53700; pages printed: 2
Working backwards and just focusing on parsing first, I'd like to be able to specifically get the "/GRP", "R3446 (in general, R** as this is the printer name)", and get a counter that shows how often a specific printer appeared in the log files.
It has been a while since I last worked with Powershell, however at the moment this is what I've managed to create in order to try accomplishing my goal:
Select-String -Path "C:\Documents and Settings\a411882\My Documents\Scripts\Print Parse Test.txt" -Pattern "/GPR", " R****" -AllMatches -SimpleMatch
The code does not produce any errors, however I'm also unable to get any output to appear on screen to see if I'm capturing the /GRP and printer name. At the moment I'm trying to just ensure I'm gathering the right output before worrying about any counters. Would anyone be able to assist me and tell me what I'm doing wrong with my code?
Thanks!
EDIT: Fixed a small error with my code that was causing no data to appear on screen. At the moment this code outputs the entire two lines of test text instead of only outputting the /GPR and server name. The new output is the following:
My Documents\Scripts\Print Parse Test.txt:1:10 Document 81, A361058/GPR0000151814_1: owned by A361058 was printed on
R3556 via port IP_***.***.***.***. Size in bytes: 53704; pages printed: 2
20130219123105.000000-300
My Documents\Scripts\Print Parse Test.txt:2:10 Document 80, A361058/GPR0000151802_1: owned by A361058 was printed on
R3556 via port IP_***.***.***.***. Size in bytes: 53700; pages printed: 2
I'd like to try having it eventually look something like the following:
/GPR, R****, count: ## (although for now I'm less concerned about the counter)
You can try this. It only returns a line when /GPR (and "on" from "printed on") is present.
Get-Content .\test.txt | % {
if ($_ -match '(?:.*)(/GPR)(?:.*)(?<=on\s)(\w+)(?:.*)') {
$_ -replace '(?:.*)(/GPR)(?:.*)(?<=on\s)(\w+)(?:.*)', '$1,$2'
}
}
Output:
/GPR,R3556
/GPR,R3556
I'm sure there are better regex versions. I'm still learning it :-)
EDIT this is easier to read. The regex is still there for extraction, but I filter out lines with /GPR first using select-string instead:
Get-Content .\test.txt | Select-String -SimpleMatch -AllMatches -Pattern "/GPR" | % {
$_.Line -replace '(?:.*)(/GPR)(?:.*)(?<=on\s)(\w+)(?:.*)', '$1,$2'
}
I generally start with an example of the line I'm matching, and build a regex from that, substituting regex metacharacters for the variable parts of the text. This makes makes the regex longer, but much more intuitive to read later.
Assign the regex to a variable, and then use that variable in subsequent code to keep the messy details of the regex from cluttering up the rest of the code:
[regex]$DocPrinted =
'Document \d\d, \w+/(\D{3})[0-9_]+: owned by \w+ was printed on (\w+) via port IP_[0-9.]+ Size in bytes: \d+; pages printed: \d+'
get-content <log file> |
foreach {
if ($_ -match $DocPrinted)
{
$line -match $docprinted > $null
$matches
}
}

what is the shell script to read the log file and parse it, like get all the email adresses in the log file?

The log file contains many email addresses and i need to write a shell script to parse the log file and get all the email addresses. The log file's size is 1 GB, and my vps server's RAM is just 512m, so I want to take the performance into account. how can i do that?
if every line starts with email, you can use these coommands. First one select first 'word' of a file, and second gives unique values:
cut -f 1 -d ' ' LOGFILE.txt | sort -u

Resources