I'm using Zabbix to monitor a log file. I want to be notified whenever the regular expression 'error' has been inserted to the log.
The Item is:
log[/var/log/device-registry-service/DeviceRegistryService.log]
And this is the trigger:
svname1:log[/var/log/device-registry-service/DeviceRegistryService.log].iregexp(error)}=1
Problem is, usually when error occurs, the regex 'error' is repeating on the log's delta 10-20 times, which mean I get 10-20 mail notifications.
Can I configure it the notify me when it triggers, unless it was triggered in the last 5 minutes (or so)?
Just to make it clearer: if some arbitrary time the log delta is:
line1
line2
error1
line3
error2
line4
error3
I want it to trigger only once (or alternatively, configure the action to only once per hour)
Thanks!
edit:
since I want to know if any error has occurred, last() does not seem to help
Related
I need some sort of schedule thing to schedule a task to happen at x:y (12:00 for example) in Tcl.
The scenario is a router using Openwrt with Tcl 8.6.10 with limited RAM and storage where I have some sort of IRC client "bot" (using socket to connect). The "bot" was just a barebone that I modify to suit my needs. Most of the things work fine, except that I don't have way to schedule easily things. I wanted something like how eggdrop has "bind time" where the bind thing is "bind time flag "cron-style string" caller".
The "bot" scheme is like:
Main Tcl script:
<info+code to connect to IRC>
<while loop>
<some code in case of IRC disconnection>
<list of files with tcl code aka sub-scripts>
<usage of source based from a list of the filenames>
<code for error handling>
<end of while loop>
The list of files is source filelist.tcl, where filelist.tcl is a set var {filename1.tcl filename2.tcl...}. The filenamex.tcl has some basic code to respond to IRC server or IRC input from channels and reply to channels.
I can make some sort of schedule if I base a execution like if {[clock format [clock seconds] -format "%H:%M"]=="12:00"} {code to execute} and hopefully wait for a server ping/pong but that can lead to repeated code inside of the if body.
I been looking around and found a package called cron but I don't know how to use it correctly because there are not many examples and I don't know to use vwait properly and I don't want vwait to hang the bot waiting for a value to change. I also read about tcl threads for maybe parallel execution.
So I need some code inside of a sub-script that looks like (a package cron style):
#beginning of file
#add a task specifying hour and minute
task-at "12:00" proccaller
proc procname {optional} {
<some code to be executed at specific hour+time>
}
#end of file
I also don't know how to use after command to use it.
How can I accomplish I want?
Thanks for the replies and yes, it would help if I study event loops and coroutine, which probably comes next.
Some time has passed since I posted the question and kinda sorted the thing by creating a sub-script in a folder named scripts with the following structure:
#beginning of the script
if {![file exists executed]} {set executed "no"}
#the following clock instruction returns for example: Tuesday 22:14
switch -glob -- [clock format [clock seconds] -format "%A %H:%M"] {
"*12:00" - "*12:01" {
#Basic example of sending a message to the irc channel when it's midday
if {$executed=="no"} {
puts $fd "PRIVMSG #CODE :It's midday right now."
flush $fd
set executed "yes"
}
}
#...more time comparisions and code
default {set executed "no"}
}
#end of script
And the script is almost the top of the list of scripts to be loaded so if I wish to send some command down stream at giving time, the command can be executed.
There is double timings because the "bot" reacts, at least at minimum, to the irc server's ping which happens each 90 seconds and it may skip some minutes.
This is not an answer but an unproper workaround.
I am trying the hot code feature of erlang following the guide from LYAE but i do not understand how to make the update message to get triggered.
I have a module which runs a method that is upgradeable:
Module
-module(upgrade).
-export([main/1,upgrade/1,init/1,init_link/1]).
-record(state,{ version=0,comments=""}).
init(State)->
spawn(?MODULE,main,[State]).
main(State)->
receive
update->
NewState=?MODULE:upgrade(State),
if NewState#state.version>3 -> exit("Max Version Reached") end,
?MODULE:main(NewState);
SomeMessage->
main(State)
end.
upgrade(State=#state{version=Version,comments=Comments})->
Comm=case Version rem 2 of
0 -> "Even version";
_ -> "Uneven version"
end,
#state{version=Version+1,comments=Comm}.
Shell
>c(upgrade).
>rr(upgrade,state).
>U=upgrade:init(#state{version=0,comments="initial"}).
>Monitor=monitor(process,U).
> ......to something to trigger the update message
> flush(). % see the exit message reason
I do not understand how can i perform a hot code reload in order to trigger the update message.
I want when i use flush to get the exit reason from my main method.
The process expects to get the atom update as a message. Since you have the pid of the process in the variable U, you can send the message like this:
U ! update.
Note that the strings Even version and Uneven version are only kept in the state, never printed, so you won't see those. The only thing you'll see is the exit message, after sending update four times and calling flush().
I am trying to configure a trigger in Zabbix in order to monitore a simple eventLog from a Windows server. The trigger works and an alarm raises but after 30s without this event it should get back to normal. But the problem is it never gets back to normal.
Here is the expression
{SERVER1:eventlog[Application,,,,15007,,skip].logeventid(15007)}=1
and
Recovery Expression
{SERVER1:eventlog[Application,,,,15007,,skip].nodata(30)}=1
Try this:
Here is the expression
{SERVER1:eventlog[Application,,,,15007].logeventid(15007)}=1
and {SERVER1:eventlog[Application,,,,15007].nodata(30)}=0
Recovery Expression
{SERVER1:eventlog[Application,,,,15007].nodata(30)}=1
I am new to zabbix. I have a basic requirement of monitoring occurrence of different log messages using zabbix. Say, when there is a log message "server starting", zabbix should show that alert. The idea is that if the server (re)starts 10 times in last 10 minutes, the zabbix dashboard (or at any other place) should display that 10 times.
I have done the following for that :
Created an item under template MyTemplate:
Type : Zabbix Agent (Active)
key : log[/opt/mylog/logs/abc.log,server starting]
Type of information : Log
Update Interval (in sec) : 30
Created a trigger with expression :
{MyTemplate:log[/opt/mylog/logs/abc.log,server
starting].logeventid(1)}=0
With logeventid(1), I am seeing that the alert (trigger) is getting generated only once. It appears only once in the Dashboard --> Last 20 issues. If I go to Monitoring --> Trigger, I see the alert only once, although the log files have 10 entries of the message "server starting" (server restarted 10 times).
Then I set the trigger to following :
{MyTemplate:log[/opt/mylog/logs/abc.log,server
starting].nodata(300)}=0
Now, at Monitoring --> Trigger, I see the alert (trigger) 10 times, but, from the Dashboard --> Last 20 issues it vanishes just after 300 seconds.
My questions are :
What should be the trigger function, I should use? I want to see 10 alerts in zabbix if the same message appears 10 times in the log file within a period of time.
With nodata(300), why does the alert vanish after 300 sec?
Is it ok if I use 30 minutes instead of 300 seconds as an argument of nodata()?
Function logeventid() is normally used for Windows and VMware event logs. In this case, it should probably not be used and it is suspicious that it fires, which might indicate a bug in Zabbix.
Anyway, you can check "Multiple PROBLEM events generation" box in trigger configuration and the trigger will generate a new PROBLEM event every time the condition is true, regardless of its previous value. Instead of logeventid(), you can try using a function that is always true, for instance, strlen()>0.
If you wish the trigger to go into OK state after some time, say, 10 minutes, you can add nodata(10m). Then your trigger will look like this:
{MyTemplate:log[/opt/mylog/logs/abc.log,server starting].strlen()}>0 and
{MyTemplate:log[/opt/mylog/logs/abc.log,server starting].nodata(10m)}=0
I wrote a admin script that tails a heroku log and every n seconds, it summarizes averages and notifies me if i cross a certain threshold (yes I know and love new relic -- but I want to do custom stuff).
Here is the entire script.
I have never been a master of IO and threads, I wonder if I am making a silly mistake. I have a couple of daemon threads that have while(true){} which could be the culprit. For example:
# read new lines
f = File.open(file, "r")
f.seek(0, IO::SEEK_END)
while true do
select([f])
line = f.gets
parse_heroku_line(line)
end
I use one daemon to watch for new lines of a log, and the other to periodically summarize.
Does someone see a way to make it less processor-intensive?
This probably runs hot because you never really block while reading from the temporary file. IO::select is a thin layer over POSIX select(2). It looks like you're trying to block until the file is ready for reading, but select(2) considers EOF to be ready ("a file descriptor is also ready on end-of-file"), so you always return right away from select then call gets which returns nil at EOF.
You can get a truer EOF reading and nice blocking behavior by avoiding the thread which writes to the temp file and instead using IO::popen to fork the %x[heroku logs --ps router --tail --app pipewave-cedar] log tailer, connected to a ruby IO object on which you can loop over gets, exiting when gets returns nil (indicating the log tailer finished). gets on the pipe from the tailer will block when there's nothing to read and your script will only run as hot as it takes to do your line parsing and reporting.
EDIT: I'm not set up to actually try your code, but you should be able to replace the log tailer thread and your temp file read loop with this code to get the behavior described above:
IO.popen( %w{ heroku logs --ps router --tail --app my-heroku-app } ) do |logf|
while line = logf.gets
parse_heroku_line(line) if line =~ /^/
end
end
I also notice your reporting thread does not do anything to synchronize access to #total_lines, #total_errors, etc. So, you have some minor race conditions where you can get inconsistent values from the instance vars that parse_heroku_line method updates.
select is about whether a read would block. f is just a plain old file, so you when get to the end reads don't block, they just return nil instantly. As a result select returns instantly rather than waiting for something to be appending to the file as I assume you're expecting. Because of this you're sitting in a tight busy loop, so high cpu is to be expected.
If you are at eof (you could either check f.eof? or whether gets returns nil), then you could either start sleeping (perhaps with some sort of back off) or use something like listen to be notified of filesystem changes