Multiple times trigger generation in Zabbix - monitoring

I am new to zabbix. I have a basic requirement of monitoring occurrence of different log messages using zabbix. Say, when there is a log message "server starting", zabbix should show that alert. The idea is that if the server (re)starts 10 times in last 10 minutes, the zabbix dashboard (or at any other place) should display that 10 times.
I have done the following for that :
Created an item under template MyTemplate:
Type : Zabbix Agent (Active)
key : log[/opt/mylog/logs/abc.log,server starting]
Type of information : Log
Update Interval (in sec) : 30
Created a trigger with expression :
{MyTemplate:log[/opt/mylog/logs/abc.log,server
starting].logeventid(1)}=0
With logeventid(1), I am seeing that the alert (trigger) is getting generated only once. It appears only once in the Dashboard --> Last 20 issues. If I go to Monitoring --> Trigger, I see the alert only once, although the log files have 10 entries of the message "server starting" (server restarted 10 times).
Then I set the trigger to following :
{MyTemplate:log[/opt/mylog/logs/abc.log,server
starting].nodata(300)}=0
Now, at Monitoring --> Trigger, I see the alert (trigger) 10 times, but, from the Dashboard --> Last 20 issues it vanishes just after 300 seconds.
My questions are :
What should be the trigger function, I should use? I want to see 10 alerts in zabbix if the same message appears 10 times in the log file within a period of time.
With nodata(300), why does the alert vanish after 300 sec?
Is it ok if I use 30 minutes instead of 300 seconds as an argument of nodata()?

Function logeventid() is normally used for Windows and VMware event logs. In this case, it should probably not be used and it is suspicious that it fires, which might indicate a bug in Zabbix.
Anyway, you can check "Multiple PROBLEM events generation" box in trigger configuration and the trigger will generate a new PROBLEM event every time the condition is true, regardless of its previous value. Instead of logeventid(), you can try using a function that is always true, for instance, strlen()>0.
If you wish the trigger to go into OK state after some time, say, 10 minutes, you can add nodata(10m). Then your trigger will look like this:
{MyTemplate:log[/opt/mylog/logs/abc.log,server starting].strlen()}>0 and
{MyTemplate:log[/opt/mylog/logs/abc.log,server starting].nodata(10m)}=0

Related

Way to get some sort of schedule in TCL without blocking on-going code

I need some sort of schedule thing to schedule a task to happen at x:y (12:00 for example) in Tcl.
The scenario is a router using Openwrt with Tcl 8.6.10 with limited RAM and storage where I have some sort of IRC client "bot" (using socket to connect). The "bot" was just a barebone that I modify to suit my needs. Most of the things work fine, except that I don't have way to schedule easily things. I wanted something like how eggdrop has "bind time" where the bind thing is "bind time flag "cron-style string" caller".
The "bot" scheme is like:
Main Tcl script:
<info+code to connect to IRC>
<while loop>
<some code in case of IRC disconnection>
<list of files with tcl code aka sub-scripts>
<usage of source based from a list of the filenames>
<code for error handling>
<end of while loop>
The list of files is source filelist.tcl, where filelist.tcl is a set var {filename1.tcl filename2.tcl...}. The filenamex.tcl has some basic code to respond to IRC server or IRC input from channels and reply to channels.
I can make some sort of schedule if I base a execution like if {[clock format [clock seconds] -format "%H:%M"]=="12:00"} {code to execute} and hopefully wait for a server ping/pong but that can lead to repeated code inside of the if body.
I been looking around and found a package called cron but I don't know how to use it correctly because there are not many examples and I don't know to use vwait properly and I don't want vwait to hang the bot waiting for a value to change. I also read about tcl threads for maybe parallel execution.
So I need some code inside of a sub-script that looks like (a package cron style):
#beginning of file
#add a task specifying hour and minute
task-at "12:00" proccaller
proc procname {optional} {
<some code to be executed at specific hour+time>
}
#end of file
I also don't know how to use after command to use it.
How can I accomplish I want?
Thanks for the replies and yes, it would help if I study event loops and coroutine, which probably comes next.
Some time has passed since I posted the question and kinda sorted the thing by creating a sub-script in a folder named scripts with the following structure:
#beginning of the script
if {![file exists executed]} {set executed "no"}
#the following clock instruction returns for example: Tuesday 22:14
switch -glob -- [clock format [clock seconds] -format "%A %H:%M"] {
"*12:00" - "*12:01" {
#Basic example of sending a message to the irc channel when it's midday
if {$executed=="no"} {
puts $fd "PRIVMSG #CODE :It's midday right now."
flush $fd
set executed "yes"
}
}
#...more time comparisions and code
default {set executed "no"}
}
#end of script
And the script is almost the top of the list of scripts to be loaded so if I wish to send some command down stream at giving time, the command can be executed.
There is double timings because the "bot" reacts, at least at minimum, to the irc server's ping which happens each 90 seconds and it may skip some minutes.
This is not an answer but an unproper workaround.

Windowing and Watermark in Apache beam : Google dataflow

I have a fixed window of 1 minute. I am considering event time.
beam.WindowInto(window.FixedWindows(300))
When I deploy this code ,is the window created instantly even if I have not published any message .suppose I deployed at 6:30 , is it like the windows are automatically created as 6:30 to 6:35, 6:35 to 6:40 and so on ?
If I publish a message to topic having
event timestamp = 6:31 (unix seconds i.e 10,176589653)
when system time = 6:36
..does it mean the watermark for that specific message is at 6:31 and it will miss the window as system time is at 6:36 and allowed lateness=0 and will be rejected.
Windows are always created using UNIX time 0 as a base, meaning, no matter if you start the pipeline at 6:31, 6:32 or 6:35, the windows would always be [6:30, 6:35), [6:35, 6:40).... Note that this also applies for days, the windows would start at 00:00 UTC.
If you want to change this, there's an offset parameter.

Cypress Time out not respecting the value given on command

Since our test agents are slow some times- i am trying to add some additional time outs for some commands
I did it like using time out value on command as shown below.But its not respecting the value given
My understanding is cypress will wait for "10000" MS for getting the #Addstory element?
Can any one advice is this is the correct way please?
Thank you so much
cy.get('#addstory > .ng-scope').click({ timeout: 10000 })
In cypress.json file, increase the timeout to 10 seconds or what ever timeout you want like this: "defaultCommandTimeout": 10000 and save the file. Now close the app and open it again. Navigate to Settings > Configuration you should be able to see the new value set for defaultCommandTimeout.
I issue was i was adding time out on click not for getting the element when i changed like below -All good waiting for add story to be visible as i expected before click
cy.get('#addstory > .ng-scope',{ timeout: 10000 }).click()

How to make Postman/Newman to fail a test after certain time has passed?

So, in my collection I have about ten requests, with the last two being:
/Wait 10 seconds
/Check Complete
The first makes a call to the postman's echo (delay by 10 seconds) and the second is the call to my system to check for the status complete. Now, if status is unavailable I wait another 10s:
postman.setNextRequest("Wait 10 seconds");
The complete status on my system can appear in a minute or so. Now, as one can see - it is an infinite loop if something goes wrong with the system and status is never complete. Is there a way in postman/newman test to fail a test if it has been going for more than 2 minutes, for example.
Additionally, this will be executed in jenkins with command line, so I am not really looking into postman settings or delays between requests in the runner.
you may have a look to newman options here : https://www.npmjs.com/package/newman#newman-run-collection-file-source-options. The interesting option is
--timeout-request : it will surely fulfill your need.
In Postman itself, you may test the responseTime. I recall that there is a snippet, on the right part, which looks like this:
tests["Response time is less than 200ms"] = responseTime < 200;
and which could help you as the test fails if response does not occur within the requested time.
Alexandre
If you are going to be using Jenkins pipeline you can use the timeout step to cause long running jobs to result in failure, here's on for 2 mins.
timeout(120) {
node {
sh 'newman command'
}
}
Check out the "Pipeline Syntax" editor in Jenkins to generated your code block and look for other useful functions.

Zabbix log trigger

I'm using Zabbix to monitor a log file. I want to be notified whenever the regular expression 'error' has been inserted to the log.
The Item is:
log[/var/log/device-registry-service/DeviceRegistryService.log]
And this is the trigger:
svname1:log[/var/log/device-registry-service/DeviceRegistryService.log].iregexp(error)}=1
Problem is, usually when error occurs, the regex 'error' is repeating on the log's delta 10-20 times, which mean I get 10-20 mail notifications.
Can I configure it the notify me when it triggers, unless it was triggered in the last 5 minutes (or so)?
Just to make it clearer: if some arbitrary time the log delta is:
line1
line2
error1
line3
error2
line4
error3
I want it to trigger only once (or alternatively, configure the action to only once per hour)
Thanks!
edit:
since I want to know if any error has occurred, last() does not seem to help

Resources