Create InfluxDB v2 graph for increasing counter value - influxdb

I am using the SNMP plugin for the Telegraf agent, to gather a counter value, such as udpInDatagrams.
The data points are being written to InfluxDB v2, running in a Linux container, using Docker.
The counter value always increases, but never decreases.
Right now, my graph shows the counter value increasing, but never decreasing. My objective is to see the value of the counter per-period, rather than an absolute / lifetime value.
In other words, each value on the graph should be:
current_value - last_value
Question: How do I create an InfluxDB v2 graph that shows the value of the counter per-period (ie. 10 seconds, 1 minute, 5 minutes), instead of constantly increasing in value?

With Flux there is the derivative() function and with InfluxQL, there is the NON_NEGATIVE_DERIVATIVE() function, which is in this case preferred over the DERIVATIVE() function as the snmp counter can also overflow (reset).

Related

How long Prometheus timeseries last without and update

If I send a gauge to Prometheus then the payload has a timestamp and a value like:
metric_name {label="value"} 2.0 16239938546837
If I query it on Prometheus I can see a continous line. Without sending a payload for the same metric the line stops. Sending the same metric after some minutes I get another continous line, but it is not connected with the old line.
Is this fixed in Prometheus how long a timeseries last without getting an update?
I think the first answer by Marc is in a different context.
Any timeseries in prometheus goes stale in 5m by default if the collection stops - https://www.robustperception.io/staleness-and-promql. In other words, the line stops on graph (or grafana).
So if you resume the metrics collection again within 5 minutes, then it will connect the line by default. But if there is no collection for more than 5 minutes then it will show a disconnect on the graph. You can tweak that on Grafana to ignore drops but that not ideal in some cases as you do want to see when the collection stopped instead of giving the false impression that there was continuous collection. Alternatively, you can avoid the disconnect using some functions like avg_over_time(metric_name[10m]) as needed.
There is two questions here :
1. How long does prometheus keeps the data ?
This depends on the configuration you have for your storage. By default, on local storage, prometheus have a retention of 15days. You can find out more in the documentation. You can also change this value with this option : --storage.tsdb.retention.time
2. When will I have a "hole" in my graph ?
The line you see on a graph is made by joining each point from each scrape. Those scrape are done regularly based on the scrape_interval value you have in your scrape_config. So basically, if you have no data during one scrape, then you'll have a hole.
So there is no definitive answer, this depends essentially on your scrape_interval.
Note that if you're using a function that evaluate metrics for a certain amount of time, then missing one scrape will not alter your graph. For example, using a rate[5m] will not alter your graph if you scrape every 1m (as you'll have 4 other samples to do the rate).

Grafana + Prometheus: Display single stat of how often an event occurred

How do I use Prometheus + Grafana to tell how many time an event occurs
during a given time period?
I have a Prometheus counter that I increment every time this event happens. I would like to display it in a Singlestat number. It seems like this should be as simple as:
sum(increase(some_event_happened{application="example-app"}[$__range]))
And the display set to "Current" value.
However, this gives numbers that are much higher than the actual number of events in the given range. Also, it seems to vary based on how much I offset the range, and how large the range is.
More importantly, it crashes our Prometheus server with an out of memory error when I have three or four of these on a single dashboard.
I've tried setting a recorded rule to address the crashes, but I haven't figured out the right way to slice up the record rule to still be able to display the Grafana range.
So in summary, I want a Singlestat displaying the number of times an event happened in the current time range set in the Grafana dashboard. It seems like this is a very basic thing for a monitoring system. Am I just using the wrong approach?
I've encountered similar issues and they appear to be due to discrepancies between the query interval (in Prometheus) and the min step (in Grafana). Try using this global, built-in variable for your interval, which will make sure Prometheus is always in sync with the Grafana step: $__interval.
sum(increase(some_event_happened{application="example-app"}[$__interval]))
http://docs.grafana.org/reference/templating/
https://www.stroppykitten.com/technical/prometheus-grafana-statistics

How to graph individual Summary metric instances in Prometheus?

I'm using Prometheus' Summary metric to collect the latency of an API call. Instead of making an actual API call, I'm simply calling Thread.sleep(1000) to simulate a 1 second api-call latency value -- this makes the Summary hold a value of .01 (for 1 second of latency). But if, for example, I invoke Thread.sleep(1000) twice in the same minute, the Summary metric ends up with a value of .02 (for 2 seconds of latency), instead of two individual instances of .01 latency that just happened to occur within the same minute. My problem is the Prometheus query. The Prometheus query I am currently using is: rate(my_custom_summary_sum[1m]).
What should my Prometheus query be, such that I can see the latency of each individual Thread.sleep(1000) invocation. As of right now, the Summary metric collects and displays the total latency sum per minute. How can I display the latency of each individual call to Thread.sleep(1000) (i.e. the API request)?
private static final Summary mySummary = Summary.build()
.name("my_custom_summary")
.help("This is a custom summary that keeps track of latency")
.register();
Summary.Timer requestTimer = mySummary.startTimer(); //starting timer for mySummary 'Summary' metric
Thread.sleep(1000); //sleep for one second
requestTimer.observeDuration(); //record the time elapsed
This is the graph that results from this query:
Prometheus graph
Prometheus is a metrics-based monitoring system, it cares about overall performance and behaviour - not individual requests.
What you are looking for is a logs-based system, such as Graylog or the ELK stack.

Measure service latency with Prometheus

I am new to Prometheus and Grafana. My primary goal is to get the response time per request.
For me it seemed to be a simple thing - but whatever I do I do not get the results I require.
I need to be able to analyse the service latency in the last minutes/hours/days. The current implementation I found was a simple SUMMARY (without definition of quantiles) which is scraped every 15s.
Is it possible to get the average request latency of the last minute from my Prometheus SUMMARY?
If YES: How? If NO: What should I do?
Currently I am using the following query:
rate(http_response_time_sum{application="myapp",handler="myHandler", status="200"}[1m])
/
rate(http_response_time_count{application="myapp",handler="myHandler", status="200"}[1m])
I am getting two "datasets". The value of the first is "NaN". I suppose this is the result from a division by zero.
(I am using spring-client).
Your query is correct. The result will be NaN if there have been no queries in the past minute.

Offset graphite metrics by the lowest value in current time range

I have Grafana with Graphite metrics. I have a graph showing the EnqueueCount of some specific queue in ActiveMQ. The problem is that the EnqueueCount shows all values since the queue was created, so when I narrow down the time range in Grafana to "today so far", the graph looks like this:
I would like it to show only values for current period - I would like the graph to always start at 0. In this case I would like to offset it by -2. There is an offset function, however it is only by constant, while I would need something like "offset by lowest value in time period".
I went through Graphite documentation, but cannot find any function which would allow me to do this.
Any ideas how I could achieve this?
Versions we use:
Grafana v4.2.0 (commit: 349f3eb)
graphite-web-0.9.12-5
python-carbon-0.9.12-3
Please use nonNegativeDerivative() function - then you will get a rate of EnqueueCount change in (your metric interval, usually it is) minute. If you want to get count again - use integral().
So, integral(nonNegativeDerivative(EnqueueCount)) - but usually people looking for rate, then derivative is enough.

Resources