Search string occurrence and display directory wise count - grep

We have a error log directory structure wherein we store all errors log files for a particular day in datewise directories -
errorbackup/20150629/errorlogFile3453123.log.xml
errorbackup/20150629/errorlogFile5676934.log.xml
errorbackup/20150629/errorlogFile9812387.log.xml
errorbackup/20150628/errorlogFile1097172.log.xml
errorbackup/20150628/errorlogFile1908071_log.xml
errorbackup/20150627/errorlogFile5675733.log.xml
errorbackup/20150627/errorlogFile9452344.log.xml
errorbackup/20150626/errorlogFile6363446.log.xml
I want to search for a particular string in the error log file and get the output such that I will get directory wise search result of a count of that string's occurrence. For example grep "blahblahSQLError" should output something like-
20150629:0
20150628:0
20150627:1
20150626:1
This is needed because we fixed some errors in one of the release and I want to make sure that there are no occurrences of that error since the day it was deployed to Prod. Also note that there are thousands of error log files created every day. Each error log file is created with a random number in its name to ensure uniqueness.

If you are sure the filenames of the log files will not contain any "odd" characters or newlines then something like the following should work.
for dir in errorbackup/*; do
printf '%s:%s\n' "${dir#*/}" "$(grep -l blahblahSQLError "$dir/"*.xml | wc -l)"
done
If they can have unexpected names then you would need to use multiple calls to grep and count the matching files manually I believe. Something like this.
for dir in errorbackup/*; do
_dcount=0;
for log in "$dir"/*.xml; do
grep -l blahblahSQLError "$log" && _dcount=$((_dcount + 1));
done
done

Something like this should do it:
for dir in errorbackup/*
do
awk -v dir="${dir##*/}" -v OFS=':' '/blahblahSQLError/{c++} END{print dir, c+0}' "$dir"/*
done
There's probably a cuter way to do it with find and xargs to avoid the loop and you could certainly do it all within one awk command but life's too short....

Related

Grep for matching pattern but exclude particular string

I've file containing various log lines, and I want to grep for a pattern but exclude when message:com.mycompany.excluded , so basically following should be returned :
"The log found this message blah, message:com.blahblah"
"The log found this message blah2, message:com.blahblah2"
"The log found this message foobar, message:com.mycompany"
"The log found this message blah, message:com.mycompany.included"
but not :
"The log found this message blah, message:com.mycompany.excluded"
I'm using this pattern, but it would not work for excluding com.mycompany.excluded
grep "The log found this message.*message:.*" "mylogs.txt"
awk '/The log found this message.*message:/ && !/com.mycompany.excluded/' "mylogs.txt"
There's various ways to make it more robust but they're probably not necessary (depending on what the rest of your log file looks like).
What you want here is sed, not grep:
sed -i .bak 's/^.*excluded//' mylogs.txt
(The -i .bak creates a backup of your original file with the extension .bak; this would leave you with mylogs.txt and mylogs.txt.bak, with mylogs.txt having the undesired lines removed.)

mongoexport --collection too many positional options

I am getting this error when I run mongoexport query.
too many positional arguments: [—-collection thermal_comfort_collection]
mongoexport --db gccdb —-collection thermal_comfort_collection --type=csv --fields Timestamp,Temperature,User,ThermalComfort --query '{settingID: ObjectId("58992333441be20c7f834868")}' --out thermal_comfort_103060.csv
I've tried 'thermal_comfort_collection' and "thermal_comfort_collection", however, both gave me the same error. How should I fix it?
My issue stemmed from copying and pasting from a different editor (Evernote in this specific case). It transformed my '--' into a long '—', in fact your copying and pasting of the issue looks very similar.
Try deleting out your dashes before "collection" and replacing with typed in dashes --

jq substring gives "jq: error: Cannot index string with object"

Problem
I'm trying to filter a json JQ result to only show a substring of the original string. For example if a JQ filter grabed the value
4ffceab674ea8bb5ec421c612536696839bbaccecf64e851dfc270d795ee55d1
I want it to only return the first 10 characters 4ffceab674.
What I've tried
On the Official JQ website you can find an example that should give me what I need:
Command: jq '.[2:4]'
Input: "abcdefghi"
Output: "cd"
I've tried to test this out with a simple example in the unix terminal:
# this works fine, => "abcdefghi"
echo '"abcdefghi"' | jq '.'
# this doesn't work => jq: error: Cannot index string with object
echo '"abcdefghi"' | jq '.[2:4]'
So, it turns out most of these filters are not yet in the released version. For reference see issue #289
What you could do is download the latest development version and compile from source. See download page > From source on Linux
After that, if indexing still doesn't work for strings, you should, at least, be able to do explode, index, implode combination, which seems to have been your plan.
Looking at the jq-1.3 manual I suspect there isn't a solution using that version since it offers no primitives for extacting parts of a string.

Parse through text file and write out data

I'm working on the first steps towards creating a powershell script that will read through printer logs (probably using get-WMI cmdlet), and parse through the logs. Afterwards, I plan on having the script output to a .txt file the name of the printer, a counter of the number of times a printer was used (if possible), and specific info found in the logs.
In order to do this, I've decided to try working backwards. Below is a small portion of what the logs will look like:
10 Document 81, A361058/GPR0000151814_1: owned by A361058 was printed on R3556 via port IP_***.***.***.***. Size in bytes: 53704; pages printed: 2 20130219123105.000000-300
10 Document 80, A361058/GPR0000151802_1: owned by A361058 was printed on R3556 via port IP_***.***.***.***. Size in bytes: 53700; pages printed: 2
Working backwards and just focusing on parsing first, I'd like to be able to specifically get the "/GRP", "R3446 (in general, R** as this is the printer name)", and get a counter that shows how often a specific printer appeared in the log files.
It has been a while since I last worked with Powershell, however at the moment this is what I've managed to create in order to try accomplishing my goal:
Select-String -Path "C:\Documents and Settings\a411882\My Documents\Scripts\Print Parse Test.txt" -Pattern "/GPR", " R****" -AllMatches -SimpleMatch
The code does not produce any errors, however I'm also unable to get any output to appear on screen to see if I'm capturing the /GRP and printer name. At the moment I'm trying to just ensure I'm gathering the right output before worrying about any counters. Would anyone be able to assist me and tell me what I'm doing wrong with my code?
Thanks!
EDIT: Fixed a small error with my code that was causing no data to appear on screen. At the moment this code outputs the entire two lines of test text instead of only outputting the /GPR and server name. The new output is the following:
My Documents\Scripts\Print Parse Test.txt:1:10 Document 81, A361058/GPR0000151814_1: owned by A361058 was printed on
R3556 via port IP_***.***.***.***. Size in bytes: 53704; pages printed: 2
20130219123105.000000-300
My Documents\Scripts\Print Parse Test.txt:2:10 Document 80, A361058/GPR0000151802_1: owned by A361058 was printed on
R3556 via port IP_***.***.***.***. Size in bytes: 53700; pages printed: 2
I'd like to try having it eventually look something like the following:
/GPR, R****, count: ## (although for now I'm less concerned about the counter)
You can try this. It only returns a line when /GPR (and "on" from "printed on") is present.
Get-Content .\test.txt | % {
if ($_ -match '(?:.*)(/GPR)(?:.*)(?<=on\s)(\w+)(?:.*)') {
$_ -replace '(?:.*)(/GPR)(?:.*)(?<=on\s)(\w+)(?:.*)', '$1,$2'
}
}
Output:
/GPR,R3556
/GPR,R3556
I'm sure there are better regex versions. I'm still learning it :-)
EDIT this is easier to read. The regex is still there for extraction, but I filter out lines with /GPR first using select-string instead:
Get-Content .\test.txt | Select-String -SimpleMatch -AllMatches -Pattern "/GPR" | % {
$_.Line -replace '(?:.*)(/GPR)(?:.*)(?<=on\s)(\w+)(?:.*)', '$1,$2'
}
I generally start with an example of the line I'm matching, and build a regex from that, substituting regex metacharacters for the variable parts of the text. This makes makes the regex longer, but much more intuitive to read later.
Assign the regex to a variable, and then use that variable in subsequent code to keep the messy details of the regex from cluttering up the rest of the code:
[regex]$DocPrinted =
'Document \d\d, \w+/(\D{3})[0-9_]+: owned by \w+ was printed on (\w+) via port IP_[0-9.]+ Size in bytes: \d+; pages printed: \d+'
get-content <log file> |
foreach {
if ($_ -match $DocPrinted)
{
$line -match $docprinted > $null
$matches
}
}

what is the shell script to read the log file and parse it, like get all the email adresses in the log file?

The log file contains many email addresses and i need to write a shell script to parse the log file and get all the email addresses. The log file's size is 1 GB, and my vps server's RAM is just 512m, so I want to take the performance into account. how can i do that?
if every line starts with email, you can use these coommands. First one select first 'word' of a file, and second gives unique values:
cut -f 1 -d ' ' LOGFILE.txt | sort -u

Resources