I would like to count the number of data points in kapacitor and send an alert if none have arrived in the last hour. The deadman doesn't work because it goes to time windows of x minutes: if the window is 1h it will check if measurements have arrived in that range (e.g. 12-13 no datapoints have arrived so at 13 it goes on alert but if I send a measurement at 13.01 the alert will go off only at 14). I would like the window to start at the time of the last telemetry:
I thought of using batch instead of stream but I can't figure out how to count the data points (I would like to have something like this)
|alert()
.crit(lambda: "count_value" = 0.0)
Related
I have a situation where I'm trying to count the number of files loaded into the system I am monitoring. I'm sending a "load time" metric to Datadog each time a file is loaded, and I need to send an alert whenever an expected file does not appear. To do this, I was going to count up the number of "load time" metrics sent to Datadog in a 24 hour period, then use anomaly detection to see whether it was less than the normal number expected. However, I'm having some trouble finding a way to consistently pull out this count for use in the alert.
I can't use the count_nonzero function, as some of my files are empty and have a load time of 0. I do know about .as_count() and count:metric{tags}, but I haven't found a way to include an evaluation interval with either of these. I've tried using .rollup(count, time) to count up the metrics sent, but this call seems to return variable results based on the rollup interval. For instance, if I compare intervals of 2000 and 4000 seconds, I would expect each 4000 second interval to count up about the sum of two 2000 second intervals over the same time period. This does not seem to be what happens at all - the counts for the smaller intervals seem to add up to much more than the count for the larger one. Additionally some rollup intervals display decimal numbers as counts, which does not make any sense to me if this function is doing what I thought it was.
Does anyone have any thoughts on how to accomplish this? I'd really appreciate any new ideas.
The below code registers five metrics count, oneminuteRate, fiftenMinuteRate, fiveMinuteRate, meanRate into graphite for every 30 seconds from my application.
public void collectMetric(string metricName, long metricValue){
mr.meter(metricName).mark(value)
}
I would like to show in the Grafana dashboard the no of requests that are received every minute.(i,e if in the first minute 60 is received, in the second minute 120 is received) Since the count in the meter metric above just keeps increasing and all the *Rate values are events per second. I am not sure how to log metric into Grafana dashboard that displays the no of requests received per minute. Any advice is highly appreciated?
Suppose if I use
mr.counter(metricName).inc(value) IS there a way to reset the counters every 1 minute?
I had the same problem. The way that I found is that I resolved this in Grafana.
When you're on the panel's metrics, you can add a function to your query like this:
You can try the derivative() function or perSecond() function but these functions are not completly reliable, it depends what you're doing with these in your panel.
But with these you'll see the number of input in time and not the total.
Been rolling around the web and forums, cannot find a resource on this.
What I am to achieve is create an alert for when there is no change in data for a period of time.
We are monitoring openfiles for our webserver/s so this number fluctuates rather often. Noticed that when the number is stagnant it points to an issue on the server. So what we want is if openfile remains X for 2minutes alert us.
I made such an alert through a small succession of things:
I have an exclusive 'alerting dummy board', for all the alerts, since I can only have one alert per graph (grafana version 6.6.0)
I use the following query: avg_over_time(delta(Sensor_Data[1m])[20s:]) - this calculates the 20s average of 'first_value-last_value of 1min interval'
My data gathering program feeds into prometheus and this in turn into grafana -- if this program freezes, it might continue sending the last value to prometheus, and the above query will drop to strictly zero.
so I have an alert which goes off if the above query is within a range (-0.01, 0.01) for a minute (a typical value of the above query with system running is abs(query) > 0.18)
Thus, Grafana sends an alert if the Sensor_Data value does not change within about 2-3 minutes.
If you do use Prometheus and Alert manager, There is a nice function that worked for me.
changes
So using something like this in Alert manager will trigger if no changes for the time interval
changes(metric_name[5m]) = 0
This has worked for me. Make sure you're using a rate or increase function (no change means it will drop to zero) and filter the query like the following:
increase(metric_name) > 0
Then, in Alert Config, set "If no data or all values are null" to "Alerting". That way, when there's no data, the alert will be triggered.
I want to count the peak disc usage in a window of 5 mins.
I am new to tick script and kapacitor. this is the sample code. The thing is I only want to count in the active window (not the emitted 2 min window, even if it had some data points).
var curr = stream
|from()
.measurement('disk_usage_root_used_percentage')
|window()
.period(5m)
.every(2m)
.align()
// here i want the count to happen
|alert()
.crit(lambda: "count" >5 )
.log('/tmp/alerts.log')
Q:
How can I count the peak disc usage in a window of 5 minutes?
A:
What is going to happen when you specify period=5m and every=2m is that, Kapacitor will buffer up 5 minutes worth of point data and try to write it to its pipeline every 2 minutes.
So if the stream task were to go on for 10m, you'll find that your TICK script will be executed 5 times in total.
For each execution window, the dataset will consists of 3m of older data and 2m of the newer ones. Essentially they are overlapped, this is bad because your use case here is to only analyse the latest 5m point data and raise alarms if required, not looking back old data. In other words, you don't want to be spammed by false alarms.
To correct it, you will need to specify .period=5m and .every=5m for the window node. Doing so you'll find that the TICK gets ran twice upon 10 minutes up time, with each run consisting of the latest 5 minutes worth of data.
Let me know if this helps.
I am using Complex Event Processing (Esper) technology to provide a real-time candlestick calculations in my system. I am doing fine with calculating values, however I find it difficult to ensure that candle window starts at full minutes (for one minute candle) and ends before the next minute starts (i.e. candle 1[06:00.000 - 06:00.999], candle 2[06:01.000 - 06:01.999], etc... ).
Is there a pattern or command in Esper's query language that is able to provide such functionality?
I'd appreciate constructive comments and directions.
In Esper you can use a pattern to fire every minute at the zero second, i.e.
insert into TriggerEvent select * from pattern[pattern[every timer:interval(1 min).]
// named window to hold candle data, compute next candle
on TriggerEvent select * from NamedWindowCandle ....
// delete old data
on TriggerEvent delete from NamedWindowCandle
-rg
Local time is often different from exchange time, also there is latency in delivering tick data. Minute bars are often computed using exchange timestamp. The exchange timestamp value must be extracted from tick events. New minute bar event is sent when the tick timestamps enter new minute.