Detect missing label in Prometheus by comparing it to yesterday - monitoring

I have a long list of devices in Prometheus and I'm looking for a way to set alert which will show which exact device is missing.
The metric (simplified) looks like that:
device{name="server1"}
device{name="server2"}
etc
A query like that will indicate that there is/are missing device(s):
count(device) - count(device offset 1d)
However then I will have to manually go through them to find which one is missing exactly. Is there a way to compare the "name" labels and show the missing ones in alert?

You can try experimenting with something like:
device{} offset 1d unless device{}
This will return all timeseries in device metric 1d ago, where there is no counterpart right now. You can then alert on individual devices if needed.
Or it this is the preference you can alert on count(...) > 0 on the above and then use the fact that Prometheus supports executing queries in the templates for alert labels/annotations and put a list of devices in (for example) description annotation (or whichever annotation the receiver of alerts uses). This is shown in here

Related

Grafana: Panel with time of last result

I have an elasticsearch instance that receives logs from multiple backup routines. I'd like to query ES for these logs from Grafana and set up a panel that shows the last time for the different backups. Ideally I would also like to be able to show this in color if the time is longer than a certain threshold.
Basically the idea is to have a display that shows, for instance, green if a certain backup has been completed in the last 24 hours, and red if it hasn't.
How would I do this in Grafana with ES as the datasource?
Exact implementation depends on the used panel.
Example for singlestat: write ES query and then select Stat: Time of last point, you may need to select suitable unit/format:
Unfortunately, Grafana doesn't understand thresholds in your requested time format (older than 24 hours). You will need to return it as metric (for example as age of last backup in seconds) = you will need to write query for that. That means, that you will have 2 stats to show (last time + age), so you won't be able to use singlestat. Probably table panel will be better - you can use thresholding based on the age metric there.
In addition to the great answer by Jan Garaj, it looks like there is work being done to make this type of thing much easier in the future.
Check out this issue to check progress.

Grafana + Prometheus: Display single stat of how often an event occurred

How do I use Prometheus + Grafana to tell how many time an event occurs
during a given time period?
I have a Prometheus counter that I increment every time this event happens. I would like to display it in a Singlestat number. It seems like this should be as simple as:
sum(increase(some_event_happened{application="example-app"}[$__range]))
And the display set to "Current" value.
However, this gives numbers that are much higher than the actual number of events in the given range. Also, it seems to vary based on how much I offset the range, and how large the range is.
More importantly, it crashes our Prometheus server with an out of memory error when I have three or four of these on a single dashboard.
I've tried setting a recorded rule to address the crashes, but I haven't figured out the right way to slice up the record rule to still be able to display the Grafana range.
So in summary, I want a Singlestat displaying the number of times an event happened in the current time range set in the Grafana dashboard. It seems like this is a very basic thing for a monitoring system. Am I just using the wrong approach?
I've encountered similar issues and they appear to be due to discrepancies between the query interval (in Prometheus) and the min step (in Grafana). Try using this global, built-in variable for your interval, which will make sure Prometheus is always in sync with the Grafana step: $__interval.
sum(increase(some_event_happened{application="example-app"}[$__interval]))
http://docs.grafana.org/reference/templating/
https://www.stroppykitten.com/technical/prometheus-grafana-statistics

In InfluxDB/Telegraf How to compute difference between 2 fields based on 3rd field

I have the current use case:
We have a system that computes different response time metrics for messages that we want to insert in InfluxDB. This system writes JSON entries to a file.
We use telegraf with JSON plugin to extract the fields we want and insert into InfluxDB.
So far so good.
But we have an issue with 1 particular information.
The system will emit messages where mId is the Unique identifier, in the below examples we have 2 uuidXXXX and uuidYYYY:
{“meta1”:“value”, “mId”:“uuidXXXX”, “resTime1”:1232332233, “timeWeEnterBus”:startTimestamp}
{“meta1”:“value2”, “mId”:“uuidYYYY”, “resTime1”:1232331111, “timeWeEnterBus”:startTimestamp}
{“meta1”:“value”, “mId”:“uuidXXXX”, “resTime1”:1232332233, “timeWeExitBus”:endTimestamp}
{“meta1”:“value2”, “mId”:“uuidYYYY”, “resTime1”:1232331111, “timeWeEnterBus”:startTimestamp}
And what we want here is to graph the timeInBus which is equal to “timeWeExitBus-timeWeEnterBus” for each unique mId.
So my questions are:
IMU, uuid would be a field not a tag as it is unlimited, same for timeWeExitBus and timeWeEnterBus which would be numeric fields since we want to use functions on them. And timeInBus would be the measurement. Am I right ?
Is this use case a good one for Influx / Telegraf or are we misusing it for this ? IMU, it doesn’t look like a good use case to try to compute this on telegraf side, but I don’t see how to do it in InfluxDB, I initially thought ELAPSED function could help but I end up thinking it doesn’t work here
If it’s a good use case, could you point me to documentation helping implementing this ?

Grafana Alerting when there is no change in data for x minutes

Been rolling around the web and forums, cannot find a resource on this.
What I am to achieve is create an alert for when there is no change in data for a period of time.
We are monitoring openfiles for our webserver/s so this number fluctuates rather often. Noticed that when the number is stagnant it points to an issue on the server. So what we want is if openfile remains X for 2minutes alert us.
I made such an alert through a small succession of things:
I have an exclusive 'alerting dummy board', for all the alerts, since I can only have one alert per graph (grafana version 6.6.0)
I use the following query: avg_over_time(delta(Sensor_Data[1m])[20s:]) - this calculates the 20s average of 'first_value-last_value of 1min interval'
My data gathering program feeds into prometheus and this in turn into grafana -- if this program freezes, it might continue sending the last value to prometheus, and the above query will drop to strictly zero.
so I have an alert which goes off if the above query is within a range (-0.01, 0.01) for a minute (a typical value of the above query with system running is abs(query) > 0.18)
Thus, Grafana sends an alert if the Sensor_Data value does not change within about 2-3 minutes.
If you do use Prometheus and Alert manager, There is a nice function that worked for me.
changes
So using something like this in Alert manager will trigger if no changes for the time interval
changes(metric_name[5m]) = 0
This has worked for me. Make sure you're using a rate or increase function (no change means it will drop to zero) and filter the query like the following:
increase(metric_name) > 0
Then, in Alert Config, set "If no data or all values are null" to "Alerting". That way, when there's no data, the alert will be triggered.

Lowest value from 2 payloads in Node-Red

I have a IoT system in home and two temperature sensors.
One of the sensors could work in some hours in direct sun.
The real temperature is always the lowest value, so sometimes temp1, sometimes temp2.
What I want to achieve is:
read the temperature from sensors1 (via MQTT)
read the temperature from sensors2 (via MQTT)
compare values
find the lowest one and send in via MQTT
go back to reading in loop
For this example I can simulate readings with injection nodes
How to do that? I am new in Node-Red, have tried but without success.
Here is my flow:
[{"id":"fa6372cc.47f92","type":"tab","label":"Flow 8","disabled":false,"info":""},{"id":"5ac90e03.22da3","type":"join","z":"fa6372cc.47f92","name":"","mode":"custom","build":"object","property":"payload","propertyType":"msg","key":"topic","joiner":"","joinerType":"str","accumulate":true,"timeout":"","count":"2","reduceRight":false,"reduceExp":"","reduceInit":"","reduceInitType":"","reduceFixup":"","x":990,"y":340,"wires":[["f09774bf.3c8428","a197b84d.6a7338"]]},{"id":"f09774bf.3c8428","type":"debug","z":"fa6372cc.47f92","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","x":1130,"y":340,"wires":[]},{"id":"43900e79.98cd8","type":"change","z":"fa6372cc.47f92","name":"set payload value","rules":[{"t":"set","p":"payload","pt":"msg","to":"req.params.value","tot":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":790,"y":340,"wires":[["5ac90e03.22da3"]]},{"id":"b71d9143.c03bd","type":"change","z":"fa6372cc.47f92","name":"set topic temp1","rules":[{"t":"set","p":"topic","pt":"msg","to":"temp1","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":560,"y":320,"wires":[["43900e79.98cd8"]]},{"id":"e87114aa.6cd1","type":"change","z":"fa6372cc.47f92","name":"set topic temp2","rules":[{"t":"set","p":"topic","pt":"msg","to":"temp2","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":560,"y":360,"wires":[["43900e79.98cd8"]]},{"id":"783c47fd.8dd58","type":"inject","z":"fa6372cc.47f92","name":"temp source 2","topic":"","payload":"12","payloadType":"num","repeat":"3","crontab":"","once":false,"onceDelay":"1.5","x":380,"y":360,"wires":[["e87114aa.6cd1"]]},{"id":"271dedab.aaa7b2","type":"inject","z":"fa6372cc.47f92","name":"temp source 1","topic":"","payload":"10","payloadType":"num","repeat":"2","crontab":"","once":false,"onceDelay":"1","x":380,"y":320,"wires":[["b71d9143.c03bd"]]},{"id":"a197b84d.6a7338","type":"mqtt out","z":"fa6372cc.47f92","name":"temperature","topic":"domoticz/in","qos":"","retain":"","broker":"7e3561ec.acad","x":1150,"y":280,"wires":[]},{"id":"7e3561ec.acad","type":"mqtt-broker","z":"","name":"Domoticz","broker":"192.168.6.11","port":"8084","clientid":"","usetls":false,"compatmode":true,"keepalive":"60","cleansession":true,"birthTopic":"","birthQos":"0","birthRetain":"false","birthPayload":"","closeTopic":"","closeRetain":"false","closePayload":"","willTopic":"","willQos":"0","willRetain":"false","willPayload":""}]
One way to do it would be like this:
This is storing the two temps in flow variables - the first flow initially sets them to a high number so the "min" in "choose lower value" will later work. In this case I've used a change node setting the payload to the JSONata of
$min([$flowContext("temp1"), $flowContext("temp2")])
but there's a few ways you could choose to do it.
Here is the code to try:
[{"id":"6bc2755e.9feb9c","type":"debug","z":"f454a93f.0e89d8","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","x":990,"y":340,"wires":[]},{"id":"38bd03eb.f7d06c","type":"change","z":"f454a93f.0e89d8","name":"choose lower value","rules":[{"t":"set","p":"payload","pt":"msg","to":"$min([$flowContext(\"temp1\"), $flowContext(\"temp2\")])\t","tot":"jsonata"}],"action":"","property":"","from":"","to":"","reg":false,"x":790,"y":340,"wires":[["6bc2755e.9feb9c"]]},{"id":"9066677f.eb0358","type":"change","z":"f454a93f.0e89d8","name":"store temp1","rules":[{"t":"set","p":"temp1","pt":"flow","to":"payload","tot":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":550,"y":320,"wires":[["38bd03eb.f7d06c"]]},{"id":"a70c9b2a.e7db58","type":"change","z":"f454a93f.0e89d8","name":"store temp2","rules":[{"t":"set","p":"temp2","pt":"flow","to":"payload","tot":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":550,"y":360,"wires":[["38bd03eb.f7d06c"]]},{"id":"4bd27616.d022c8","type":"inject","z":"f454a93f.0e89d8","name":"temp source 2","topic":"","payload":"12","payloadType":"num","repeat":"","crontab":"","once":false,"onceDelay":"1.5","x":370,"y":360,"wires":[["a70c9b2a.e7db58"]]},{"id":"7378dd4f.3825b4","type":"inject","z":"f454a93f.0e89d8","name":"temp source 1","topic":"","payload":"10","payloadType":"num","repeat":"","crontab":"","once":false,"onceDelay":"1","x":370,"y":320,"wires":[["9066677f.eb0358"]]},{"id":"314eb0ec.85211","type":"inject","z":"f454a93f.0e89d8","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":true,"onceDelay":0.1,"x":370,"y":260,"wires":[["688646b.138a6b8"]]},{"id":"688646b.138a6b8","type":"change","z":"f454a93f.0e89d8","name":"set to high","rules":[{"t":"set","p":"temp1","pt":"flow","to":"999","tot":"num"},{"t":"set","p":"temp2","pt":"flow","to":"999","tot":"num"}],"action":"","property":"","from":"","to":"","reg":false,"x":550,"y":260,"wires":[[]]}]

Resources